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^ ficial chromosome (YAC) contig map. In a next step we isolated and analysed all CAG/CTG repeats from this region and excluded 
2 them from involvement in BP disorder. Here, in the process of identifying all CCG/CGG repeats from the region, we isolated three 

potential CpG islands, one of which is located 1.5 kb upstream of a predicted exon of 3639 bp. Further analysis showed this was 
Q part of a novel CpG-associated, brain-expressed gene, that we called NCAG1 (Novel CpG Associated Gene 1). Mutation analysis of 
J^. this positional and functional candidate identified two single nucleotide polymorphisms, none of which were shown to be associated 

with the BP phenotype. 
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NOVEL BRAIN EXPRESSED GENE AND PROTEIN ASSOCIATED WITH 

BIPOLAR DISORDER 



5 

FIELD OF THE INVENTION : 

The invention is broadly concerned with the determination of genetic factors associated 
with psychiatric health. More particularly, the present invention is directed to a human 
10 gene which is linked to a mood disorder or related disorder in affected individuals and 
their families. Specifically, the present invention is directed to a gene located on the 
eighteenth chromosome that is expressed in brain tissue and may be used as a 
diagnostic marker for bipolar disorder. 

15 

BACKGROUND OF THE INVENTION : 
Pharmacogenetics background: 

20 Every individual is a product of the interaction of their genes and the environment. 

Pharmacogenetics is the study of how genetic differences influence the variability in 
patients responses to drugs. Through the use of pharmacogenetics, we will soon be 
able to profile variations between individualsDNA to predict responses to a particular 
medicine. Target validation that will predict a well-tolerated and effective medicine for 

25 a clinical indication in humans is a widely perceived problem; but the real challenge is 
target selection. A limited number of molecular target families have been identified, 
including receptors and enzymes, for which high throughput screening is currently 
possible. A good target is one against which many compounds can be screened rapidly 
to identify active molecules (hits). These hits can be developed into optimized 

30 molecules (leads), which have the properties of well-tolerated and effective medicines. 
Selection of targets that can be validated for a disease or clinical symptom is a major 
problem faced by the pharmaceutical industry. The best-validated targets are those that 
have already produced well-tolerated and effective medicines in humans (precedent 
targets). Many targets are chosen on the basis of scientific hypotheses and do not lead 

35 to effective medicines because the initial hypotheses are often subsequendy disproved. 
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Two broad strategies are being used to identify genes and express their protein products 
for use as high-throughput targets. These approaches of genomics and genetics share 
technologies but represent distinct scientific tactics and investments. Discovery 
genomics uses the increasing number of databases of DNA sequence information to 
5 identify genes and families of genes for tractable or scrollable targets that are not 
known to be genetically related to disease. 

The advantage of information on disease-susceptibility genes derived from patients is 
that, by definition, these genes are relevant to the patients 'genetic contributions to the 

10 disease. However, most susceptibility genes will not be tractable targets or amenable 
to high-throughput screening methods to identify active compounds. 
The differential metabolism related to the relevant gene variants can be studied in 
focused functional genomic and proteomic technologies to discover mechanisms of 
disease development or progression. 

15 Critical enzymes of receptors associated with the altered metabolism can be used as 
targets. Gene-to-function-to-target strategies that focus on the role of the specific 
susceptibility gene variants on appropriate cellular metabolism become important. 
Data mining of sequences from the Human Genome Project and similar programmes 
with powerful bioinformatic tools has made it possible to identify gene families by 

20 locating domains that possess similar sequences. Genes identified by these genomic 
strategies generally require some sort of functional validation or relationship to a 
disease process. Technologies such as differential gene expression, transgenic animal 
models, proteomics, in situ hybridization and immunohistochemistry are used to imply 
relationships between a gene and a disease. 

25 

The major distinction between the genomic and genetic approaches is target selection, 
which genetically defined genes and variant-specific targets already known to be 
involved in the disease process. The current vogue of discovery genomics for 
nonspecific, wholesale gene identification, with each gene in search of a relationship to 
30 a disease, creates great opportunities for development of medicines. 



It is also critical to realize that the core problem for drug development is poor target 
selection. The screening use of unproven technologies to imply disease-related 
validation, and the huge investment necessary to progress each selected gene to proof 
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of a concept in humans, is based on an unproven and cavalier use of the word 
Validation*. Each failure is very expensive in lost time and money. For example, 
differential gene expression (DGE) and proeomics are screening technologies that are 
widely used for target validation. They detect different levels and/or patterns of gene 
5 and protein expression in tissues, which may be used to imply a relationship to a 
disease affecting that tissue. 

Mood Disorder Background: 

Mood disorders or related disorders include but are not limited to the following 

10 disorders as defined in the Diagnostic and statistical Manual of Mental Disorders, 
version 4 (DSM-IV) taxonomy DSM-IV codes in parenthesis): mood disorders 
(296.XX,300.4,3 11,301.13,295.70) , schizophrenia and related disorders 
(295.XX,297.1,298.8,297.3,298.9), anxiety disorders (300.XX,309.81,308.3), 
adjustment disorders (309.XX) and personality disorders (codes 301. XX) . 

15 The present invention is particularly directed to genetic factors associated with a family 
of mood disorders known as Bipolar (BP) spectrum disorders. Bipolar disorder (BP) is 
a severe psychiatric condition that is characterized by disturbances in mood, ranging 
from an extreme state of elation (mania) to a severe state of dysphoria (depression). 
Two types of bipolar illness have been described: type I BP illness (BPI) is 

20 characterized by major depressive episodes alternated with phases of mania, and type II 
BP illness (BPII) , characterized by major depressive episodes alternating with phases 
of hypomania. Relatives of BP probands have an increased risk for BP, unipolar 
disorder (patients only experiencing depressive episodes; UP), cyclothymia (minor 
depression and hypomania episodes; cy) as well as for schizoaffective disorders of the 

25 manic (SAm) and depressive (SAd) type. Based on these observations BP, cY, UP and 
SA are classified as BP spectrum disorders. 

The involvement of genetic factors in the etiology of BP spectrum disorders was 
suggested by family, twin and adoption studies (Tsuang and Faraone (1990), the 
Genetics of Mood Disorders, Baltimore, The John Hopkins University Press) However, 
30 the exact pattern of transmission is unknown. In some studies, complex segregation 
analysis supports the existence of a single major locus for BP (Spence et al. (1995), Am 
J.Med. Genet (Neuropsych. Genet.) QQ pp 370-376). Other researchers propose a 
liability-threshold-model, in which the liability to develop the disorder results from the 
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additive combination of multiple genetic and environmental effects (McGuffin et al. 
(1994) , Affective Disorders; Seminars in Psychiatric Genetics Gaskell, London pp 
110-127). 

Due to the complex mode of inheritance, parametric and non-parametric linkage 
5 strategies are applied in families in which BP disorder appears to be transmitted in a 
Mendelian fashion. Early linkage findings on chromosomes llpl5 (Egeland et al. 
(1987) , Nature ~ pp 783-787) and Xq27-q28 (Mendlewicz 'et al. (1987, the Lancet 1 pp 
1230 -1232; Baron et al. (1987) Nature 12& pp 289-292) have been controversial and 
could initially not be replicated (Kelsoe et al. (1989) Nature - pp 238-243; Baron et al. 
10 (1993) Nature Genet ~ pp 49-55) .with the development of a human genetic map 
saturated with highly polymorphic markers and the continuous development of data 
analysis techniques, numerous new linkage searches were started. In several studies, 
evidence or suggestive evidence for linkage to particular regions on chromosomes 4, 
12, 18, 21 and X was found (Black wood et al. (1996) Nature Genetics ~ pp 427-430, 
15 Craddock et al. (1994) Brit J. psychiatry ~ pp355-358, Berrettini et al. (1994), Proc 
Natl Acad Sci USA ~ pp 5918-5921, Straub et al. (1994) Nature Genetics ~ pp 291-296 
and Pekkarinen et al. (1995) Genome Research 2 pp 105-115). In order to test the 
validity of the reported linkage results, these findings have to be replicated in other, 
independent studies. 

20 Recently, linkage of bipolar disorder to the pericentromeric region on chromosome 18 
was reported (Berrettini et al. 1994). Also a ring chromosome 18 with break-points 
and deleted regions at 18pter-pll and 18q23-qter was reported in three unrelated 
patients with BP illness or relates syndromes (Craddock et al. 1994). The chromosome 
18p linkage was replicated by stine et al. (1995) Am J. Hum Genet 22 pp 1384-1394, 

25 who also reported suggestive evidence for a locus on 18q21.2-q2L32 in the same 
study. 

Interestingly, Stine et al. observed a parent-of-origin effect: the evidence of linkage was 
the strongest in the paternal pedigrees, in which the proband's father or one of the 
proband's father's sibs is affected. Several studies described anticipation in families 
30 transmitting BP disorder(McInnis et al 1993, Nylander et al 1994) suggesting the 
involvement of trinucleotide repeat expansions (TREs), considering a number of 
diseases caused by an expansion of a CAG/CTG, a CCG/CGG or a GAA/TTC repeat 
show anticipation (reviewed by Margolis et al.(Margolis et al 1999)). Previous efforts 
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to find potentially expanded repeats have primarily focused on CAG/CTG repeats 
although the search for CCG/CGG repeats is increasing(Kleiderlein et al 1998, Mangel 
et al 1998, Eichhammer et al 1998, Kaushik et al 2000). Previously, we reported on a 
new method for the region specific isolation of triplet repeats: triplet repeat YAC 

5 fragmentation(Del Favero et al 1999). This proved to be a valid method for the 
isolation of CAG/CTG repeats and using this method, we exlcuded the involvement of 
CAG/CTG repeats from within 18q21.33-q23 in bipolar disorder(Goossens et al 2000). 
The present invention adapted the method for the region specific isolation of 
CCG/CGG repeats and applied it to the chromosome 18q21.33-q23 BP candidate 

10 region. 

SUMMARY OF THE INVENTION ; 

The present invention is directed to a novel gene and protein encoded by that gene. 

The novel gene is located at an 8.9 cM chromosome region located between D18S68 
15 and D18S979 at 18q21.33-q23 A physical map was constructed using yeast artificial 
chromosomes (YACs)(Verheyen et al 1999). 

The previously described method was adapted for the region specific isolation of 
CCG/CGG repeats and applied to the chromosome 18q21.33-q23 BP candidate region. 
Three potential CpG islands were isolated, one of which is located 1.5 kb upstream of 
20 a predicted exon of 3639 bp. Further analysis showed this was part of a novel CpG- 
associated, brain-expressed gene, herein called NCAG1 (Novel CpG Associated Gene 
1). Mutation analysis of this positional and functional candidate identified two single 
nucleotide polymorphisms, which may be useful as a diagnostic marker for BP 
phenotype. 

25 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1. List of all human ESTs found by BLASTN alignment searches of dbEST. 
ESTs are named with their Genbank Acc Nos. I.M.A.G.E. Consortium [LLNL] cDNA 
30 Clones(Lennon et al 1996) are named with their RZPD clone ID. 

Figure 2: Minimal YAC tiling path of the 18q21.33-q23 BP candidate 
region(Verheyen et al 1999). The YACs are represented by solid lines, the CCG/CGG 
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fragmentation products by dotted lines. YAC sizes, between brackets, are estimated by 
PFGE analysis. Solid circles indicate positive STS/STR hits. Shaded boxes highlight 
the CCG/CGG repeat and the three CpG islands isolated by YAC fragmentation. 

5 Figure 3: Feature map of NCAG1. a) Predicted Features by bioinformatics. They 
encompass the CpG island as predicted by LCP(Huang 1994) and CPG(Larsen et al 
1992), the ORF or exon as predicted by Grail(Uberbacher & Mural 1991) and 
Genscan(Burge & Karlin 1997), the transcription start site (TSS) as predicted by 
Proscan(Prestridge 1995)and the relevant polyadenylation signals as predicted by 

10 PolyAH(Salamov & Solovyev 1997). The numbers below the features indicate the 
scores as returned by Proscan and Poly AH. b) Alignment of EST hits. ESTs are named 
with their Genbank Acc Nos. c) Alignment of cDNA clones. LM.A.G.E. Consortium 
[LLNL] cDNA Clones(Lennon et al 1996) are named with their RZPD clone ID. d) 
RT-PCR products. The grey bars represent the RT-PCR product, the thin black lines 

15 represent the sequences obtained on the nested PCRs. 



DETAILED DESCRIPTION OF THE INVENTION: 

The present invention is directed to a novel gene located at the 18q chromosomal 
20 candidate region of chromosome 18. More specifically, the gene is located at an 8.9 
cM region located between D18S68 and D18S979 at 18q21.33-q23. 
The gene is located at a chromosomal region associated with mood disorders such as 
bipolar spectrum disorders and may therefore be useful as a diagnostic marker for 
bipolar spectrum disorders. The region in question when removed from the totality of 
25 the human genome may also be used to locate, isolate and sequence other genes which 
influences psychiatric health and mood. 

Isolation and identification of Identification of novel gene: 

Standard procedures well-known to one skilled in the art were applied to the identified 
30 YAC clones and, where applicable, to the DNA from an individual afflicted with a 
mood disorder as defined herein, in the process of identifying and characterizing the 
relevant gene. For example, the inventors are able to make use of the previously 
identified apparent association between trinucleotide repeat expansions (TRE) within 
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the human genome and the phenomenon of anticipation in mood disorders (Lindblad et 
al. (1995), Neurobiology of Disease 2. pp 55-62 and ODonovan et al. (1995), Nature 
Genetics 1Q pp 380-381) to screen for TRE's in the selected YAC clones in order to 
identify candidate genes in the region of interest on human chromosome 18. A variety 
5 of other known procedures can also be applied to the said YAC clones to identify the 
candidate gene as discussed below. 

Accordingly, in a first aspect the present invention comprises the use of an 8.9 cM 
region of human chromosome 18q disposed between polymorphic markers D18S68 and 

10 D18S979 or a fragment thereof for identifying at least one human gene, including 
mutated and polymorphic variants thereof, which is associated with mood disorders or 
related disorders as defined above. As will be described below, the present inventors 
have identified this candidate region of chromosome 18q for such a gene, by analysis of 
co-segregation of bipolar disease in family MAD31 with 12 STR polymorphic markers 

15 previously located between D18S5 1 and D18S61 and subsequent LaD score analysis. 
Particular YACs covering the candidate region which may be used in accordance with 
the present invention are 961.h-9, 942-C.3, 766-f-12, 731-c- 7, 907.e.l, 752-g-8 and 
717-d-3, preferred ones being 961h-9, 766.f.l2 and 907-e.l since these have the 
minimum tiling path across the candidate region, suitable YAC clones for use are those 

20 having an artificial chromosome spanning the refined candidate region between 
D18S68andD18S979. 

There are a number of methods which can be applied to the candidate regions of 
chromosome 18q as defined above, whether or not present in a YAC, to identify a 
candidate gene or genes associated with mood disorders or related disorders. For 
25 example, as aforesaid, there is an apparent association between the extent of 
trinucleotide repeat expansions (TRE) in the human genome and the presence of mood 
disorders. 

Accordingly, in a third aspect the present invention comprises a method of identifying 
at least one human gene, including mutated and polymorphic variants thereof, which is 
30 associated with a mood disorder or related disorder as defined herein which comprises 
detecting nucleotide triplet repeats in the region of human chromosome 18q disposed 
between polymorphic markers D18S68 and D18S979. 
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An alternative method of identifying said gene or genes comprises fragmenting a YAC 
clone comprising a portion of human chromosome 18q disposed between polymorphic 
markers D18S60 and D18S61, for example one or more of the seven aforementioned 
YAC clones, and detecting any nucleotide triplet repeats in said fragments, in particular 
5 repeats of CAG or CTG. Nucleic acid probes comprising at least 5 and preferably at 
least 10 CTG and/or CAG triplet repeats are a suitable means of detection when 
appropriately labelled. Trinucleotide repeats may also be determined using the known 
RED (repeat expansion detection) system (Shalling et al. (1993) , Nature Genetics - pp 
135-139). 

10 In a fourth embodiment the invention comprises a method of identifying at least one 
gene, including mutated and polymorphic variants thereof, which is 
associated with a mood disorder or related disorder and which is present in a YAC 
clone spanning the region of human chromosome 18q between polymorphic markers 
D18S60 and D18S61, the method comprising the step of detecting the expression 

15 product of a gene incorporating nucleotide triplet repeats by use of an antibody capable 
of recognizing a protein with anamino acid sequence comprising a string of at least 8, 
but preferably at least 12, continuous glutamine residues. Such a method may be 
implemented by sub-cloning YAC DNA, for example from the seven aforementioned 
YAC clones, into a human DNA expression library. A preferred means of detecting the 

20 relevant expression product is by use of a monoclonal antibody, in particular mABlC2, 
the preparation and properties of which are described in International Patent. 
Application Publication No WO 97/17445. 

Further embodiments of the present invention relate to methods of identifying the 
relevant gene orgenes which involve the sub-cloning of YAC DNA as defined above 

25 into vectors such as BAC (bacterial artificial chromosome) or PAC (PI or phage 
artificial chromosome) or cosmid vectors such as exon-trap cosmid vectors. The 
starting point for such methods is the construction of a contig map of the region of 
human chromosome 18q between polymorphic markers D18S60 and D18S61. To this 
end the present inventors have sequenced the end regions of the fragment of human 

30 DNA in each of the seven aforementioned YAC clones and these sequences are 
disclosed herein. Following sub-cloning of YAC DNA into other vectors as described 
above, probes comprising these end sequences or portions thereof, in particular those 
sequences shown in Figures 1 to 11 herein, together with any known sequenced tagged 
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site (STS) in this region, as described in the YAC clone contig shown herein, as can be 
used to detect overlaps between said sub-clones and a contig map can be constructed. 
Also the known sequences in the current YAC contig can be used for the generation of 
contig map sub-clones. 

5 One route by which a gene or genes which is associated with a mood disorder or 
associated disorder can be identified is by use of the known technique of exon trapping. 
This is an artificial RNA splicing assay, most often making use in current protocols of a 
specialized exon-trap cosmid vector. The vector contains an artificial mini-gene 
consisting of a segment of the SV40 genome containing an origin of replication and a 

10 powerful promoter sequence, two splicing-competentexons separated by an intron 
which contains a multiple cloning site and an SV40 polyadenylation site. 
The YAC DNA is sub-cloned in the exon-trap vector and the recombinant DNA is 
transfected into a strain of mammalian cells. Transcription from the SV40 promoter 
results in an RNA transcript which normally splices to include the two exons of the 

15 minigene. If the cloned DNA itself contains a functional exon, it can be spliced to the 
exons present in the vector's minigene. Using reverse transcriptase a cDNA copy can be 
made and using specific PCR primers, splicing events involving exons of the insert 
DNA can be identified. Such a procedure can identify coding regions in the YAC DNA 
which can be compared to the equivalent regions of DNA from a person afflicted with 

20 a mood disorder or related disorder to identify the relevant gene. 

Accordingly, in a fifth aspect the invention comprises a method of identifying at least 
one human gene, including mutated variants and polymorphisms thereof, which is 
associated with a mood disorder or related disorder which comprises the steps of: 

(1) transfecting mammalian cells with exon trap cosmid vectors prepared and mapped 
25 as described above; 

(2) culturing said mammalian cells in an appropriate medium; 

(3) isolating RNA transcripts expressed from the SV40 promoter; 

(4) preparing cDNA from said RNA transcripts; 

(5) identifying splicing events involving exons of the DNA sub-cloned into said exon 
30 trap cosmid vectors to elucidate positions of coding regions in said sub-cloned DNA; 

(6) detecting differences between said coding regions and equivalent regions in the 
DNA of an individual afflicted with said mood disorder or related disorder; and 
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(7) identifying said gene or mutated orpolymorphic variant thereof which is associated 
with said mood disorder or related disorders. 

As an alternative to exon trapping the YAC DNA may be sub-cloned into BAC, PAC, 
cosmid or other vectors and a contig map constructed as described above. There are a 
5 variety of known methods available by which the position of relevant genes on the sub- 
cloned DNA can be established as follows: 

(a) cDNA selection or capture (also called direct selection and cDNA selection) : this 
method involves the forming of genomic DNA/cDNA heteroduplexes by hybridizing a 
cloned DNA (e.g. an insert of a YAC DNA), to a complex mixture of cDNAs, such as 

10 the inserts of all cDNA clones from a specific (e.g. brain) cDNA library. Related 
sequences will hybridize and can be enriched in subsequent steps using biotin- 
streptavidine capturing and PCR (or related techniques); 

(b) hybridization to mRNA/cDNA: a genomic clone (e.g. the insert of a specific 
cosmid) can be hybridized to a Northern blot of mRNA from a panel of culture cell 

15 lines or against appropriate (e.g. brain) cDNA libraries. A positive signal can indicate 
the presence of a gene within the cloned fragment; 

(c) CpG island identification: CpG or HTF islands are short (about 1 kb) 
hypomethylated GC-rich (> 60%) sequences which are often found at the 5' ends of 
genes. CpG islands often have restriction sites for several rare-cutter restriction 

20 enzymes. Clustering of rare-cutter restriction sites is indicative of a CpG island and 
therefore of a possible gene. CpG islands can be detected by hybridization of a DNA 
clone to Southern blots of genomic DNA digested with rare-cutting enzymes, or by 
island-rescue PCR (isolation of CpGislands from YACs by amplifying sequences 
between islands and neighbouring Alu-repeats) ; 

25 (d) zoo-blotting: hybridizing a DNA clone (e.g. the insert of a specific cosmid) at 
reduced stringency against a Southern blot of genomic DNA samples from a variety of 
animal species. Detection of hybridization signals can suggest conserved sequences, 
indicating a possible gene. Accordingly, in a sixth aspect the invention comprises a 
method of identifying at least one human gene including mutated and polymorphic 

30 variants thereof which is associated with a mood disorder or related disorder which 
comprises the steps of: 

(1) sub-cloning the YAC DNA as described above into a cosmid, BAC, PAC or other 
vector; 
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(2) using the nucleotide sequences shown in any one of Figures 1 to 11 or any other 
sequenced tagged site (STS) in this region as in the YAC clone contig described herein, 
or part thereof consisting of not less than 14 contiguous bases or the complement 
thereof, to detect overlaps amongst the sub-clones and construct a map thereof; 

5 (3) identifying the position of genes within the sub-cloned DNA by one or more of 
CpG island identification, zoo-blotting, hybridization of the sub-cloned DNA to a 
cDNA library or a Northern blot of mRNA from a panel of culture cell lines; 
(4) detecting differences between said genes and equivalent region of the DNA of an 
individual afflicted with a mood disorder or related disorder; and 

10 (5) identifying said gene which is associated with said mood disorders or related 
disorders. 

If the cloned YAC DNA is sequenced, computer analysis can be used to establish the 
presence of relevant genes. Techniques such as homology searching and exon 
prediction may be applied. 

15 Once a candidate gene has been isolated in accordance with the methods of the 
invention more detailed comparisons may be made between the gene from a normal 
individual and one afflicted with a mood disorder such as a bipolar spectrum disorder. 
For example, there are two methods, described as "mutation testing", by which a 
mutation or polymorphism in a DNA sequence can be identified. In the first the DNA 

20 sample may be tested for the presence or absence of one specific mutation but this 
requires knowledge of what the mutation might be. In the second a sample of DNA is 
screened for any deviation from a standard (normal) DNA. This latter method is more 
useful for identifying candidate genes where a mutation is not identified in advance. In 
addition the following techniques may be further applied to a gene identified by the 

25 above-described methods to identify differences between genes from normal or healthy 
individuals and those afflicted with a mood disorder or related disorder: 

(a) Southern blotting techniques: a clone is hybridized to nylon membranes containing 
genomic DNA digested with different restriction enzymes of patients and 
healthyindividuals. Large differences between patients and healthy individuals can be 

30 visualized using a radioactive labelling protocol; 

(b) heteroduplex mobility in polyacrylamide gels: this technique is based on the fact 
that the mobility of heteroduplexes in non-denaturing polyacrylamide gels is less than 
the mobility of homoduplexes. It is most effective for fragments under 200 bp; 
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(c) single-strand conformational polymorphism analysis (SSCP or SSCA) : single 
stranded DNA folds up to form complex structures that are stabilized by weak 
intramolecular bonds. 

The electrophoretic mobilities of these structures on non-denaturing polyacrylamide 
5 gels depends on their chain lengths and on their conformation; 

(d) chemical cleavage of mismatches (CCM) : a radiolabeled probe is hybridized to the 
test DNA, and mismatches detected by a series of chemical reactions that cleave one 
strand of the DNA at the site of the mismatch. This is a very sensitive method and can 
be applied to kilobase-length samples; 

10 (e) enzymatic cleavage of mismatches: the assay is similar to CCM, but the cleavage is 
performed by certain bacteriophage enzymes. 

(f) denaturing gradient gel electrophoresis: in this technique, DNA duplexes are forced 
to migrate through an electrophoretic gel in which there is a gradient of increasing 
amounts of a denaturant (chemical or temperature). Migration continues until the DNA 

15 duplexes reach a position on the gel wherein the strands melt and separate, after which 
the denatured DNA does not migrate much further. A single base pair difference 
between a normal and a mutant DNA duplex is sufficient to cause them to migrate to 
different positions in the gel; 

(g) direct DNA sequencing. 

20 It will be appreciated that with respect to the methods described herein, in the step of 
detecting differences between coding regions from the YAC and the DNA of an 
individual afflicted with a mood disorder or related disorder, the said individual may 
be anybody with the disorder and not necessary a member of family MAD31. 

25 In accordance with further aspects the present invention provides an isolated human 
gene and variants thereof associated with a mood disorder or related disorder and 
which is obtainable by any of the above described methods, an isolated human protein 
encoded by said gene and a cDNA encoding said protein. 

30 Once a gene has been identified a number of methods are available to determine the 
function of the encoded protein. These methods are described by Eisenberg et al 
(Nature vol. 15, June 2000) and is herein incorporated by reference. One method 
involves a computational method that reveals functional linkages from genome 
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sequences and is called the gene neighbor metho. If in several genomes the genes that 
encode two proteins are neighbors on the chromosome, the proteins tend to be 
functionally linked. This method can be powerful in uncovering functional linkages in 
prokaryotes, where operons are common, but also shows promise for analysing 
5 interacting proteins in eukaryotes. 

Examples: 
Example 1 

10 

A :Triplet repeat isolation 

CCG/CGG YAC fragmentation vectors were constructed by cloning blunted 
(CCG)jo/(CGG)io adapters into the blunted SphI site of the previously described pDVl 
basic vector(Del-Favero et al 1999). Sequencing determined that fragmentation vectors 
15 pDVCCG and pDVCGG have the adapter sequence in a S'^CCG)^^' and a 5'- 
(CGG)io-3' orientation respectively. 

Using these vectors, CCG/CGG repeats and flanking sequences were isolated by YAC 
fragmentation as described(Del-Favero et al 1999). 

20 B: Characterisation of Structure of the NCAG1 gene. 

LMA.G.E. Consortium [LLJNL] cDNA Clones(Lennon et al 1996) 
MAGp998A136826Q2, IMAGp998A154307Q2, IMAGp998B194346Q2, 
IMAGp998D126826Q2, MAGp998D193628Q2, MAGp998F131866Q2, 
IMAGp998H201815Q2, IMAGp998K235214Q2, IMAGp998L153967Q2 and 

25 MAGp998N06839Q2 were ordered at RZPD Deutsches Ressourcenzentrum fur 
Genomforschung GmbH (Heubnerweg 6, 14059 Berlin-Charlottenburg, Germany). 
Cultures starting from single colonies were grown and plasmids were prepared by the 
Wizard Plus SV Minipreps DNA Purification System (Promega, Madison, WI). DNA 
sequencing was performed with the dideoxynucleotide sequencing method using a 

30 DNA sequencing kit (Perkin-Elmer, Foster, CA) and analysed by an ABI PRISM 377 
DNA Sequencer (Perkin-Elmer, Foster, CA) or an ABI PRISM 3700 DNA Analyser 
(Perkin-Elmer, Foster, CA). 

For the RT-PCR reactions, mRNA from SHSY-5Y cells was prepared using the 
jiMACS mRNA Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). After 
35 DNAsel treatment (Promega, Madison, WI), the RT reaction was primed with 
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oligo(dT) primers and performed with Superscript Preamplification System for First 
Strand cDNA synthesis (GibcoBRL, N.V. Life Technologies, Merelbeke, Belgium). Fs- 
cDNA was used in long-range PCR reactions with TaKaRa LA Taq (Takara Shuzo Co., 
Otsu, Shiga, Japan). PCR products were reamplified with nested primers and 
5 sequenced as described above. 

C: Characterisation of the expression pattern of the NCAG1 gene. 

Genepool cDNA (Invitrogen, Carlsbad, CA) from brain, fetal brain, placenta, liver, 

testis and lung was used as a cDNA mapping panel. The Human Brain Multiple Tissue 
10 Northern (MTN) Blot IV (Clontech, Palo Alto, CA) was used for radioactive 
hybridisation in accompanying ExpressHyb solution according to the instructions of the 
manufacturer. A zooblot was prepared by digesting 10 pig genomic DNA to completion 
with Hindin, running it on a TAE 1% agarose gel and performing a Southern blot. A 
PCR product containing the ORF of the NCAG1 gene was radioactively labelled and 
15 hybridised at 65 °C. 

D: Mutation analysis of the NCAG1 gene. 

Overlapping PCR products of approximately 600 bp were generated and sequenced as 
described above. Both identified polymorphisms were detected by digesting the PCR 
20 product with Hinfl and electrophoresing the fragments on precast ExcelGel gels on a 
Multiphor II electrophoresis system (Amersham Pharmacia Biotech AB, Uppsala, 
Sweden) 

25 E: CCG/CGG YAC fragmentation 

CCG/CGG YAC fragmentation was applied to YACs 961h9, 766fl2 and 

907el(Goossens et al 2000). Size determination by Pulsed Field Gel Electrophoresis 
(PFGE) and Southern blot hybridisation resulted in 33 sets of equally sized fragmented 
YAC clones. Sequencing of 112 fragmented YAC ends identified seven (out of 33) sets 

30 of fragmented YACs with identical end sequences resulting from a specific 
homologous recombination. One set (CCG7) was the result of fragmentation in the 
(CGG) 6 repeat in the 5' UTR of the CAP2 gene (GenBank acc. No L40377). A second 
set (CCG6) contained a (CCGh repeat and a third (CCG4) an imperfect CCCCG 
repeat. The triplet repeat in the 5' UTR of the CAP2 gene was already shown not to be 

35 associated with BP disorder(Goossens et al 2000). The size of CCG4 was analyzed in 
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12 BP and 12 UP patients, but only one allele was detected. The size of CCG6 was not 
analyzed since it was to small to be polymorphic. 

In depth analysis showed that three (CCG3, GenBank acc No CCG4, GenBank acc 
No... and CCG6, GenBank acc No ...) of the seven sequences had high CG content 

5 (70-80 %) and high CpG content (15-20 CpGs in 200 bp) but no additional CCG/CGG 
repeats were found. Primer pairs for these potential CpG islands were used to 
determine their position on the YAC contig (Figurel). BLASTN analysis(Altschul et al 
1990) resulted for both CCG4 and CCG6 in hits with sequences of RPCI-11 BACs. 
CCG4 gave a hit in a contig of 27150 bp of the working draft sequence of RPCI-11 

10 BAC 29013 (GenBank acc No AC022662, GI: 7249117). CCG6 was part of the 
complete sequence of RPCI-11 BAC 793 J2 (GenBank acc No AC009802). 

F: Identification and in silico characterisation of NCAG1 eene. 

To find genes possibly associated with the potential CpG islands CCG4 and CCG6, 

15 their surrounding BAC sequences were analysed using bioinformatic tools. Hence the 

27150 bp contig of BAC 29013 and the complete sequence of BAC 793 J2 were sent 

for analysis to the Rummage High-Throughput Sequence Annotation Server 

(http://genlOO.imb-jena.de/rummage/index.html). 

First, LCP(Huang 1994) and CPG(Larsen et al 1992) recognized CpG islands 
20 containing CCG4 and CCG6 of 1.2 kb and 0.4 kb respectively, confirming their 
potential role as CpG islands. 

In a next step, exon prediction programs Grail(Uberbacher & Mural 1991) and 
Genscan(Burge & Karlin 1997) both predicted the presence of a 3639 bp exon, 1.5 kb 
downstream of the 1.2 kb large CpG island containing CCG4. This predicted exon 

25 contains an open reading frame (ORF) which starts at an ATG start codon with an 
almost perfect Kozak sequence and ends with a TAA stop codon. Other predicted 
features are a transcription start site (TSS) at 2352 bp upstream of the ORF (score 76.6 
by Proscan(Prestridge 1995)) and polyadenylation signals at 3032, 3247, 4364, 5338 
and 8266 downstream of the ORF (respective scores of 4.79, 3.83, 4.94, 4.93 and 6.27 

30 by PolyAH(Salamov & Solovyev 1997)) (Figure2a). 

BLASTN(Altschul et al 1990) alignment searches to sequences of dbEST revealed 
significant homology (> 97 %) to 21 human ESTs (Tablel, Figure2b). 
TBLASTX(Altschul et al 1997) searches of the Genbank non-redundant database (nr) 
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with the ORF showed extensive homology on protein level with SART-2 (Genbank 
Acc No NP_037484), a squamous cell carcinoma antigen recognized by T-cells(Nakao 
et al 2000). Weaker homology was found with a series of sulfotransferases. Analysis of 
the 1212 long aminoacid sequence of the translated ORF by SMART (Simple Modular 
5 Architecture Research Tool, V3.1)(Schultz et al 2000) did not result in any known 
domains apart from a cleavable signal peptide at position 1-20 and two transmembrane 
segments at positions 771-791 and 800-820. Interpro reporterd no significant hits, 
although BLASTP(Altschul et al 1997) of the Prodom database showed homology 
between the NCAG1 gene and the chondroitin-6-sulfotransf erase domain (Prodom Acc 
10 No PD042460) 

G: Characterisation of the structural organisation of the NCAG1 gene. 

Based on the BLASTN EST hits I.M.A.G.E. Consortium [LLNL] cDNA 

Clones(Lennon et al 1996) were ordered and sequenced. The sequences alligned with 
15 the genomic sequence in the presumed 5' UTR (untranslated region), the ORF and the 
presumed 3' UTR, indicating that these sequences are indeed transcribed (Figure2c). 
Alignment of the sequence of IMAGp998B194346Q2 with the genomic sequence 
showed that a 865 bp fragment was missing in the cDNA. A detailed analysis of the 
flanking sequences revealed the presence of consensus acceptor and donor splice sites, 
20 confirming that this fragment is probably an intron. Also clone IMAGp998D193628Q2 
missed a fragment of 1.9 kb when compared to the genomic sequence, but consensus 
splice sites were absent. Two clones, IMAGp998D193628Q2 and 
IMAGp998A136826Q2, terminated exactly at the predicted polyadenylation signal, 4.4 
kb downstream of the ORF. Sequences of clones IMAGp998A154307Q2, 
25 IMAGp998D126826Q2 and IMAGp998F131866Q2 did not align with the genomic 
sequence and were not analysed further. 

Since cDNA clone sequencing did not result in a continuous sequence of the transcript, 
primers were designed and used for RT-PCR experiments. Sequencing of different 
overlapping RT-PCR products confirmed the presence of a transcript of at least 9 kb, 
30 containing the ORF of the predicted exon, linked to the presumed 5' and 3' sequences 
(Figure2d). The 5 prime intron of 865 bp was confirmed and the 3* UTR was extended 
till the predicted polyadenylation signal, 4.4 kb downstream of the ORF. 
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H: Characterisation of the expression pattern of the NCAG1 gene. 

To investigate the expression profile of the NCAG1 gene, a long-range PCR spanning 

the ORF was optimised on genomic DNA and applied on a cDNA mapping panel. This 
showed that the fragment was present in cDNA from brain, fetal brain, placenta and 
5 liver but could not be detected in cDNA from testis and lung. More detailed 

information on the expression in the brain was obtained by Northern blot hybridisation 
showing expression of a > 9.5 kb transcript in all investigated tissues (lung, placenta, 
small intestine, liver, kidney, skeletal muscle, heart, brain, uterus, trachea, thyroid, 
stomach, spinal cord, prostate, mammary gland, lymph node, brain (whole), bladder, 
10 adrenal gland, amygdala, caudate nucleus, corpus callosum, hippocampus, substantia 
nigra, thalamus and total brain). 

Stringent Zooblot hybridisation experiments showed the presence of homologous 
sequences in the genomic DNA of other mammals like dog, pig, mouse, donkey, horse 
and sheep. 

15 

I: Mutation analysis of the NCAG1 gene. 

Since this novel CpG-associated gene is brain-expressed and located in the 
chromosome 18q21.3-q23 BP candidate region, a mutation analysis of the ORF was 
performed on 3 patients and 1 escapee of the chromosome 18 linked family MAD31. In 

20 this way two single nucleotide polymorphisms were identified. The first is a C to T 
transition on position 2017 of the ORF, changing aminoacid (AA) 673 from proline to 
serine. This polymorphism was only found in the healthy control. The second 
polymorphism was found in all three patients. It was also a C to T transition, located at 
position 2824 and changing the 942 AA from proline to serine. Analysis of this 

25 polymorphism in family MAD31 showed that the T-allele was present on the disease 
haplotype. 

Both polymorphisms were analysed in an association study on 92 BP patients and 92 
age, sex and ethnicity matched controls by PCR-RFLP analysis. The P673S 
polymorphism turned out to be a frequent polymorphism with both alleles roughly 
30 equally present. The P942S polymorphism however was found to be a rare 
polymorphism, with the T allele only present in 3 BP patients and in 2 controls. 
Statistical analysis showed the control population was in Hardy- Weinberg equilibrium 
for both polymorphisms. No alleles, genotypes or haplotypes were found to be 
associated to BP disorder. 
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Since triplet repeat fragmentation was proven to be a valid method for the region 
specific isolation of triplet repeats(Goossens et al 2000), we applied it to the 
chromosome 18q21.33-q23 BP candidate region for the isolation of CCG/CGG repeats. 

5 Therefore, we first had to construct a new set of fragmentation vectors, pDVCCG and 
pDVCGG. Fragmentation experiments with these vectors resulted in transformation 
and fragmentation efficiencies in the same range as obtained with the CAG/CTG 
fragmentation vectors pDVCAG and pDVCTG (data not shown). Application of 
CCG/CGG fragmentation to YAC 961h9 resulted in the isolation of the (CGG) 6 repeat 

10 in the 5' UTR of CAP2. This repeat is adjacent to the (CAG) 6 repeat previously 
reported(Goossens et al 2000). There, it was shown that this (CGG)6(CAG)6 repeat is 
polymorphic but not expanded in BP cases nor associated with BP disorder. Taken 
together, the CCG/CGG YAC fragmentation data does not support CCG/CGG repeats 
as disease causing agents in chromosome 18q21.33-q23 linked BP disorder. 

15 On the other hand, fragmentation experiments resulted in three sequences (CCG3, 
CCG4 and CCG6) with high CG (70 - 80 %) and CpG content but containing no 
CCG/CGG repeat. CpG islands are usually defined as regions of DNA of more than 
200 bases that have a CG content above 50 % and a ratio of observed versus expected 
CpGs close to that statistically expected. Therefore, CCG3, CCG4 and CCG6 can be 

20 considered as potential CpG islands. Analysis of surrounding sequences of CCG4 and 
CCG6 with LCP(Huang 1994) and CPG(Larsen et al 1992) confirmed that the 
fragmentation occurred in both cases indeed in a CpG island. Since CpG islands are 
strongly associated with genes, more specifically housekeeping and widely expressed 
genes, these three sequences are likely to be located near this class of genes. 

25 In the search for genes possibly associated with the isolated CpG islands, exon 
prediction programs Grail (Uberbacher & Mural 1991) and Genscan(Burge & Karlin 
1997) both predicted the presence of a 3.6 kb exon downstream of the largest CpG 
island isolated. Two facts argued strongly against a false positive prediction. The first 
was that this two programs, based on different models, predicted exactly the same 

30 exon. The second was the mere presence in genomic DNA of this ORF continuing for 
3.6 kb and starting with a Kozak consensus ATG. Additional evidence that this exon 
was indeed transcribed was found in the fact that a series of ESTs had very high 
homologies (97-100 %) with sequences in and surrounding the ORF. In a next step, this 
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evidence was extended by sequencing of the cDNA clones from which the ESTs 
originated. The EST sequences were prolonged and corrected and the homologies 
increased to 99-100 %. The fact that the cDNA clones originated from different cDNA 
libraries (Tablel) indicated that the gene was expressed in different tissues. RT-PCR 

5 and northern blot experiments resulted in the final confirmation that this ORF was 
widely expressed, a usual characteristic of a CpG-associated gene. 
cDNA clone sequencing resulted in complete sequence of seven human cDNA clones 
aligning with NCAG1. In two cases a piece of genomic DNA was missing in the cDNA 
sequence. Clone IMAGp998B194346Q2 lacked a 865 bp fragment (Figure2c). Since 

10 this fragment was flanked by splice donor and acceptor consensus sequences, and since 
the fragment was also missing in the RT-PCR products, enough evidence was gathered 
to call it an intron. Clone IMAGp998D193628Q2 also missed a 1.4 kb fragment 
compared to the genomic sequence. In this case no consensus splice sites were present. 
Moreover cDNA clones MAGp998L153967Q2 and IMAGp998A136826Q2 contain 

15 sequences that are located in the missing fragment of IMAGp998D193628Q2 
(Figure2c). This data together with the fact that EST AA442543 is located entirely in 
the missing fragment (Figure2b) and the presence of this fragment in the RT-PCR 
products (Figure2d) indicate that this fragment might rather be an artifact than an 
intron. 

20 EST-homologies and cDNA clone sequencing proved that a series of cDNA clones 
terminated at a predicted polyadenylation signal, 4.3 kb downstream of the ORF or 
10.3 kb downstream of the predicted TSS. If the 5 prime intron of 865 bp is taken into 
account, the size of transcript will be 9.5 kb, which is the size of the transcript 
recognized in the Northern blot experiment. 

25 On protein level, a cleavable signal peptide and two transmembrane domains are 
predicted. If this is correct, both N-terminal and C-terminal sides will be at the same 
side of the membrane in which it is embedded The strong homology with the S ART-2 
protein is significant, but it does not add more clues as to potential functions of the 
novel protein. 

30 The 2824T allele, present on the disease haplotype in the chromosome 18 linked family 
MAD31, is a very rare allele with a frequency of 0.03. Therefore statistical analysis in 
an association sample loses a lot of its strength, leaving the possibility that this allele 
confers an increased risk for BP disorder. 
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CLAIMS 

What is claimed is: 

5 

1. An isolated nucleic acid comprising the nucleotide sequence of SEQ ID NO: 1. 

2. An isolated nucleic acid consisting essentially of the nucleotide sequence of SEQ 
ID NO: 1. 

10 

3. An isolated nucleic acid for comprising a nucleotide sequence that encodes the 
amino acid sequence of SEQ ID NO: 2. 

4. An isolated nucleic acid comprising the nucleotide sequence of SEQ ID NO: 3. 

15 

5. An isolated nucleic acid consisting essentially of the nucleotide sequence of SEQ 
ID NO: 3. 

6. An isolated nucleic acid consisting of the nucleotide sequence of SEQ ID NO: 1 or a 
20 contiguous fragment thereof wherein said isolated nucleic acid encodes a polypeptide 

having biological activity of bipolar disorder protein. 

7. An isolated nucleic acid that hybridizes under high stringency conditions to a 
nucleic acid having a sequence complementary to the nucleotide sequence of SEQ ID 

25 NO: 1, wherein said isolated nucleic acid encodes a polypeptide having biological 
activity. 

8. An isolated nucleic acid that encodes a polypeptide having the biological activity, 
said isolated nucleic acid consisting of a nucleotide sequence that is at least 90% 

30 identical to the nucleotide sequence of SEQ ID NO: 1. 

9. An isolated nucleic acid consisting of the nucleotide sequence of SEQ ID NO: 3 or a 
contiguous fragment thereof wherein said isolated nucleic acid encodes a polypeptide 
having biological activity. 

35 
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10. An isolated nucleic acid that hybridizes under high stringency conditions to a 
nucleic acid having a sequence complementary to the nucleotide sequence of SEQ ID 
NO: 3, wherein said isolated nucleic acid encodes a polypeptide having the biological 
activity, 

5 

11. An isolated nucleic acid that encodes a polypeptide having the biological activity;, 
said isolated nucleic acid consisting of a nucleotide sequence that is at least 90% 
identical to the nucleotide sequence of SEQ ID NO: 3. 

10 12. Isolated and substantially purified protein encoded by the nucleic acid of Claim 6. 

13. Isolated and substantially purified viral inhibitory protein 1 and 2 encoded by the 
nucleic acid of claim 9. 

15 14. Isolated and substantially purified viral inhibitory protein having the amino acid 
sequence of SEQ ID NO: 2. 

15. Isolated and substantially purified protein having an amino acid sequence that is at 
least 90% identical to the sequence of SEQ ID N0:2. 

20 

16. Isolated and substantially purified protein having an amino acid sequence that is at 
least 90% identical to the sequence of SEQ ID N0:4. 

17. Isolated and substantially purified protein having an amino acid sequence that is at 
25 least 90% identical to the sequence of SEQ ID NO: 4. 

18. A vector comprising the nucleic acid of claim 1. 

19. A vector comprising the nucleic acid of claim 4. 

30 

20. A vector comprising the nucleic acid of claim 6 operable linked to an expression 
control sequence. 

21. A host cell comprising the nucleic acid of claim 6. 

35 

22. A host cell comprising the vector of Claim 20. 
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23. A method of making protein 1 and 2 comprising: 

a) introducing the nucleic acid of claim 6 into a host cell; 

b) maintaining said host cell under conditions whereby said nucleic acid is expressed to 
protein; 

5 c) recovering said protein. 

24. A method of making protein comprising: 

a) introducing the nucleic acid of claim 9 into 

b) maintaining said host cell under conditions 
10 produce protein; 

c) recovering said protein. 

25. A method of making protein comprising: 
a) introducing the nucleic acid of Claim 16 into a host cell; 

15 b) maintaining said host cell under conditions whereby said nucleic acid is expressed to 
produce viral inhibitory protein; 
c) recovering said protein. 

26. A composition comprising purified protein and a carrier. 

20 

27. The composition according to claim 26 which further comprises viral inhibitory 
protein 2. 



a host cell; 

whereby said nucleic acid is expressed to 
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SEQUENCE LISTING 

<110> Jans sen Pharamceutica NV 

5 

<120> Novel Brain Expressed Gene and Protein associated with 
Bipolar Disorder 

<130> NCAG1 

10 

<140> 
<141> 

<160> 4 

15 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 9528 
20 <212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS encoding Human NCAG1 protein 
25 <222> (1507) . . (5142) 

<400> 1 

acctgctttc ggccccgccc cgcccgccgc cggcctgctc acggctcctc ccgtcctccc 60 
30 cgaagccccg cctctgaccc cgccctgtcc tgtctccgtc ccgccccacg cccgccagcc 120 
agcgtcgctg tctctcgcct tccctgaggc cccgccttca gccccgcctt caaccccgcc 180 
ccgtcctgcc tccgccccgc ccccgcttgc cggcccgcgt cgccgtctct caccctcccc 240 

35 

gggctgcgcg gccggagctg gcacagagga tcctcggccg cggcgacatc accgcctggg 300 
gacgcgggcg ctgctctgga tacggcgcca ccgagagaac ccgccgcccg cgggtctctg 360 
40 tcctgcggtc cgtggttgcc cccacaagcg tccggcgttt cctgagggcg ggcgtgtccg 420 
ggccgtgcgg gtcgcgggga ccgagcgcgg ctgaggagac cgagcctggg gcagcgcctg 480 
ccgtagcgcg ggagacgacg cgggggtctt gcggagcccc gcgggagcct ggcccgccgt 540 

45 

gcagagcagt tttctggaac tctccacctc cgtctccctt ggggcccagt gcggcgccga 600 
gcccccgtcg ggatctgcct gagaaagtgt catgaaaaaa gagcagaaga gagacctcac 660 
50 tgttgctgaa aggggaattt tctttcgccc gttggcggtt acttcatgat cggacgagaa 720 
gtatctaggt gactgaagat attccatttt tatgtttgta cacatgaagc tgataaaaga 780 
agatgtgaac atgatttctc tttgtcataa taggctgatg agtaagtaag cctgaaaaat 840 

55 

atttgaaatg aaggcaagaa ttttgaattt ttaaaaacca actaagactt tgatcacttg 900 
ttgaggatgt ttctctctca taaatgaaag aaaaacgtat tcacaagaca agaagtataa 960 
60 aaagttgaga ggaatgacaa ctgagtccac tcactcgaag aatgtcagta cttcatcatc 1020 
ttctttgggc aaacatacac aaatgcatca tacatgtgtg gtgagcttat caccagtgat 1080 



WO 02/101044 



2/22 



PCT/EP02/06316 





ggttttctgt 


gctagaaatg 


actcttaatt 


tgaattttgg 


agtgcttttt 


ctcttttttt 


1140 




acaatgtgtg 


ttccaactct 


ttgtgttaaa 


tagatttaag 


taaaggaggt 


aaatgctaaa 


1200 


5 


ttcatagtgt 


tttttacctg 


tatcacttcc 


ctgtgtatta 


tggaaaaatt 


agagatttta 


1260 




acgttattca 


aagttttact 


ggaagcaaaa 


ctgtgccagg 


gacagagata 


^ 9 *"s fr* fr* fr» ^ 


i ion 


10 


gtttctcttt 


ttggcaactg 


cacttgetta 


naatgtactg 


aatgtcagct 


ggatttcaca 


1380 


gcatatcaga 


tttacagtct 


ttgtcttatc 


aaggecttta 


ctgtatgttt 


tatactaacc 


1440 




agatgggaaa 


cacattgagc 


atcatatctg 


acatgtatgc 


ctaagggagg 


agctccccca 


1500 


15 


tggatc atg 
Met 
1 


gcg tta atg ttt aca gga cat tta eta ttc tta 
Ala Leu Met Phe Thr Gly His Leu Leu Phe Leu 
5 10 


gca tta 
Ala Leu 


1548 



ttg atg ttt get ttc tct act ttt gag gaa tct gtg age aat tat tec 1596 
20 Leu Met Phe Ala Phe Ser Thr Phe Glu Glu Ser Val Ser Asn Tyr Ser 
15 20 25 30 

gaa tgg gca gtt ttc aca gat gat ata gat cag ttt aaa aca cag aaa 1644 
Glu Trp Ala Val Phe Thr Asp Asp lie Asp Gin Phe Lys Thr Gin Lys 
25 35 40 45 

gtg caa gat ttc aga ccc aac caa aag ctg aag aaa agt atg ctt cat 1692 
Val Gin Asp Phe Arg Pro Asn Gin Lys Leu Lys Lys Ser Met Leu His 
50 55 60 

30 

cca agt tta tat ttt gat get gga gaa ate caa gca atg aga caa aag 1740 
Pro Ser Leu Tyr Phe Asp Ala Gly Glu lie Gin Ala Met Arg Gin Lys 
65 70 75 

35 tct cgt gca age cat ttg cat ctt ttt aga get ate aga agt gca gtg 1788 
Ser Arg Ala Ser His Leu His Leu Phe Arg Ala lie Arg Ser Ala Val 
80 85 90 

aca gtt atg ctg tec aac cca aca tac tac eta cct cca cca aag cat 1836 
40 Thr Val Met Leu Ser Asn Pro Thr Tyr Tyr Leu Pro Pro Pro Lys His 
95 100 105 110 

get gat ttt get gec aag tgg aat gaa att tat ggt aac aat ctg cct 1884 
Ala Asp Phe Ala Ala Lys Trp Asn Glu lie Tyr Gly Asn Asn Leu Pro 
45 115 120 125 

cct tta gca ttg tac tgt ttg tta tgc cca gaa gac aaa gtt gee ttt 1932 
Pro Leu Ala Leu Tyr Cys Leu Leu Cys Pro Glu Asp Lys Val Ala Phe 
130 135 140 

50 

gaa ttt gtc ttg gaa tat atg gac agg atg gtt ggc tac aaa gac tgg 1980 
Glu Phe Val Leu Glu Tyr Met Asp Arg Met Val Gly Tyr Lys Asp Trp 
145 150 155 

55 eta gta gag aat gca cca gga gat gag gtt cca att ggc cat tec tta 2028 
Leu Val Glu Asn Ala Pro Gly Asp Glu Val Pro He Gly His Ser Leu 
160 165 170 

aca ggt ttt gee act gee ttt gac ttt tta tat aac tta tta gat aat 2076 
60 Thr Gly Phe Ala Thr Ala Phe Asp Phe Leu Tyr Asn Leu Leu Asp Asn 
175 180 185 190 



cat 



cga aga caa aaa tac ctg gaa aaa ata tgg gtt att act gag 



gaa 



2124 
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His Arg Arg Gin Lys Tyr Leu Glu Lys lie Trp Val lie Thr Glu Glu 
195 200 205 

atg tac gag tat tec aag gtc cgc tea tgg ggc aaa cag ctt etc cat 2172 

5 Met Tyr Glu Tyr Ser Lys Val Arg Ser Trp Gly Lys Gin Leu Leu His 
210 215 220 

aac cac caa gec act aat atg ata gca tta etc aca ggg gec ttg gtg 2220 

Asn His Gin Ala Thr Asn Met lie Ala Leu Leu Thr Gly Ala Leu Val 
10 225 230 235 



15 



35 



55 



act gga gta gat aaa gga tct aaa gca aat ata tgg aaa cag get gta 
Thr Gly Val Asp Lys Gly Ser Lys Ala Asn lie Trp Lys Gin Ala Val 
240 245 250 



2268 



gtg gat gtc atg gaa aag aca atg ttt eta ttg aat cat att gtt gat 2316 
Val Asp Val Met Glu Lys Thr Met Phe Leu Leu Asn His lie Val Asp 
255 260 265 270 



20 ggt tct ttg gat gaa ggt gtg gec tat gga age tac aca get aaa tec 2364 
Gly Ser Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ala Lys Ser 
275 280 285 

gtc aca cag tat gtt ttt ctg gec cag cgc cat ttt aat ate aac aac 2412 
25 Val Thr Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn lie Asn Asn 
290 295 300 

ttg gat aat aac tgg tta aag atg cac ttt tgg ttc tat tat gee acc 2460 
Leu Asp Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr 
30 305 310 315 

ctt tta cct ggc ttc caa aga act gtg ggt ata gca gat tec aat tat 2508 
Leu Leu Pro Gly Phe Gin Arg Thr Val Gly lie Ala Asp Ser Asn Tyr 
320 325 330 



aat tgg ttt tat ggt cca gaa age cag eta gtt ttc ttg gat aag ttc 2556 
Asn Trp Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe 
335 340 345 350 



40 ate tta aag aat gga get gga aat tgg tta get cag caa att aga aag 2604 

lie Leu Lys Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin lie Arg Lys 
355 360 365 

cac cga cct aaa gat gga ccg atg gtt cct tea act gee caa agg tgg 2652 

45 His Arg Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp 

370 375 380 

agt act ctt cac act gaa tac ate tgg tat gat ccc cag etc aca cca 2700 

Ser Thr Leu His Thr Glu Tyr lie Trp Tyr Asp Pro Gin Leu Thr Pro 
50 385 390 395 

cag cca cct get gat tat ggt act gca aaa ata cac aca ttc cct aac 2748 

Gin Pro Pro Ala Asp Tyr Gly Thr Ala Lys lie His Thr Phe Pro Asn 
400 405 410 



tgg ggt gtg gtt act tat ggg get ggg ttg cca aac aca cag acc aac 2796 
Trp Gly Val Val Thr Tyr Gly Ala Gly Leu Pro Asn Thr Gin Thr Asn 
415 420 425 430 



60 acc ttt gtg tct ttt aaa tct ggg aag ctg ggg gga cga get gtg tat 2844 
Thr Phe Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr 
435 440 445 
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gac ata gtt cat ttt cag cca tat tec tgg att gat ggg tgg aga agt 2892 

Asp He Val His Phe Gin Pro Tyr Ser Trp He Asp Gly Trp Arg Ser 
450 455 460 

5 ttt aac cca gga cat gag cat cca gat cag aac tea ttt act ttt gee 2940 

Phe Asn Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala 
465 470 475 

ccc aat gga caa gta ttt gtt tct gaa get etc tat gga ccc aag ttg 2988 

10 Pro Asn Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu 

480 485 490 

age cac ctt aac aat gta ttg gtg ttt get cca tea ccc tea age cag 3036 

Ser His Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin 
15 495 500 505 510 

tgt aat aag ccc tgg gaa ggt caa ctg gga gaa tgt gcg cag tgg ctt 3084 

Cys Asn Lys Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu 
515 520 525 



20 



40 



60 



aag tgg act ggc gag gag gtt ggt gat gca get ggg gaa ata ate act 3132 
Lys Trp Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu He He Thr 
530 535 . 540 



25 gee tct caa cat ggg gaa atg gta ttt gtg agt ggg gaa gee gtg tct 3180 
Ala Ser Gin His Gly Glu Met Val Phe Val Ser Gly Glu Ala Val Ser 
545 550 555 

get tat tct tea gca atg aga ctg aaa agt gta tat cgt get ttg ctt 3228 
30 Ala Tyr Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu 
560 565 570 

etc tta aat tec caa act ctg eta gtt gtt gat cat att gag agg caa 3276 
Leu Leu Asn Ser Gin Thr Leu Leu Val Val Asp His He Glu Arg Gin 
35 575 580 585 590 

gaa gat tec cca ata aat tct gtc agt gee ttc ttt cat aat ttg gat 3324 
Glu Asp Ser Pro He Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp 
595 600 605 



att gat ttt aaa tat ate cca tat aag ttt atg aat agg tat aat ggt 3372 
He Asp Phe Lys Tyr He Pro Tyr Lys Phe Met Asn Arg Tyr Asn Gly 
610 615 620 



45 gee atg atg gat gtg tgg gat gca cat tac aaa atg ttt tgg ttt gat 3420 
Ala Met Met Asp Val Trp Asp Ala His Tyr Lys Met Phe Trp Phe Asp 
625 630 635 

cat cat ggc aat agt ccc atg gee agt ata cag gaa gca gag caa get 3468 
50 His His Gly Asn Ser Pro Met Ala Ser He Gin Glu Ala Glu Gin Ala 
640 645 650 

get gaa ttt aaa aaa cga tgg act caa ttt gtt aat gtt act ttt cag 3516 
Ala Glu Phe Lys Lys Arg Trp Thr Gin Phe Val Asn Val Thr Phe Gin 
55 655 660 665 670 

atg gaa ccc aca ate aca aga att gca tat gtc ttt tat ggg cca tat 3564 
Met Glu Pro Thr He Thr Arg He Ala Tyr Val Phe Tyr Gly Pro Tyr 
675 680 685 



ate aat gtc tec age tgc aga ttt att gat agt tec aat cct gga ctt 3612 
He Asn Val Ser Ser Cys Arg Phe He Asp Ser Ser Asn Pro Gly Leu 
690 695 700 
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cag att tct etc aat gtc aat aat act gaa cat gtt gtt tct att gta 3660 
Gin lie Ser Leu Asn Val Asn Asn Thr Glu His Val Val Ser lie Val 
705 710 715 

5 

act gat tac cat aac ctg aag aca aga ttc aat tat ctg gga ttc ggt 3708 
Thr Asp Tyr His Asn Leu Lys Thr Arg Phe Asn Tyr Leu Gly Phe Gly 
720 725 730 

10 ggc ttt gec agt gtg get gat caa ggc caa ata acc cga ttt ggt ttg 3756 
Gly Phe Ala Ser Val Ala Asp Gin Gly Gin lie Thr Arg Phe Gly Leu 
735 740 745 750 

ggc act caa gca ata gta aag cct gta aga cat gat agg att att ttc 3804 
15 Gly Thr Gin Ala lie Val Lys Pro Val Arg His Asp Arg lie lie Phe 

755 760 765 

ccc ttt gga ttt aaa ttt aat ata gca gtt gga tta att ttg tgc att 3852 
Pro Phe Gly Phe Lys Phe Asn lie Ala Val Gly Leu lie Leu Cys lie 
20 770 775 780 

age ttg gtg att tta act ttc caa tgg cgt ttt tac ctt tct ttt aga 3900 
Ser Leu Val lie Leu Thr Phe Gin Trp Arg Phe Tyr Leu Ser Phe Arg 
785 790 795 



25 



45 



55 



aaa eta atg cga tgg ata tta ata ctt gtt att gec ttg tgg ttt att 3948 
Lys Leu Met Arg Trp lie Leu He Leu Val He Ala Leu Trp Phe He 
800 805 810 



30 gag ctt ttg gat gtg tgg age act tgt agt cag ccc att tgt gca aaa 3996 
Glu Leu Leu Asp Val Trp Ser Thr Cys Ser Gin Pro He Cys Ala Lys 
815 820 825 830 

tgg aca agg aca gag get gag gga age aag aag tct ttg tct tct gaa 4044 
35 Trp Thr Arg Thr Glu Ala Glu Gly Ser Lys Lys Ser Leu Ser Ser Glu 

835 840 845 

ggg cac cac atg gat ctt cct gat gtt gtc att acc tea ctt cct ggt 4092 
Gly His His Met Asp Leu Pro Asp Val Val lie Thr Ser Leu Pro Gly 
40 850 855 860 

tea gga get gaa att etc aaa caa ctt ttt ttc aac agt agt gat ttt 4140 
Ser Gly Ala Glu He Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe 
865 870 875 



etc tac ate agg gtt cct aca gee tac att gat att cct gaa act gag 4188 
Leu Tyr He Arg Val Pro Thr Ala Tyr He Asp He Pro Glu Thr Glu 
880 885 890 



50 ttg gaa ate gac tea ttt gta gat get tgt gaa tgg aag gtg tea gat 4236 
Leu Glu He Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp 
895 900 905 910 



ate cgc agt ggg cat ttt cgt tta etc cga ggc tgg ttg cag tct tta 
He Arg Ser Gly His Phe Arg Leu Leu Arg Gly Trp Leu Gin Ser Leu 
915 920 925 



4284 



gtc cag gac aca aaa tta cat ttg caa aac ate cat ctg cat gaa ccc 4332 
Val Gin Asp Thr Lys Leu His Leu Gin Asn lie His Leu His Glu Pro 
60 930 935 940 

aat agg ggt aaa ctg gee caa tat ttt gca atg aat aag gac aaa aaa 4380 
Asn Arg Gly Lys Leu Ala Gin Tyr Phe Ala Met Asn Lys Asp Lys Lys 
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945 950 955 

aga aaa ttt aaa agg aga gag tct ttg cca gaa caa aga agt caa atg 4428 
Arg Lys Phe Lys Arg Arg Glu Ser Leu Pro Glu Gin Arg Ser Gin Met 
5 960 965 970 

aaa ggc gcc ttt gat aga gat get gaa tat att agg get ttg agg aga 4476 
Lys Gly Ala Phe Asp Arg Asp Ala Glu Tyr lie Arg Ala Leu Arg Arg 
975 980 985 990 



10 



30 



50 



cac ctg gtt tac tat cca agt gca cgt cct gtg etc agt tta age agt 4524 
His Leu Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser 
995 1000 1005 



15 gga age tgg acg tta aag ctt cat ttt ttt cag gaa gtt tta gga get 4572 
Gly Ser Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Ala 
1010 1015 1020 

teg atg agg gca ttg tac ata gta aga gac cct egg gca tgg att tat 4620 
20 Ser Met Arg Ala Leu Tyr He Val Arg Asp Pro Arg Ala Trp He Tyr 
1025 1030 1035 

tea atg ttg tac aat agt aaa cca agt ctt tat tct ttg aag aat gta 4668 
Ser Met Leu Tyr Asn Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val 
25 1040 1045 1050 

cca gag cat tta gca aaa ttg ttt aaa ata gag gga ggt aaa ggc aaa 4716 
Pro Glu His Leu Ala Lys Leu Phe Lys He Glu Gly Gly Lys Gly Lys 
1055 1060 1065 1070 



tgt aac tta aat teg ggt tat get ttc gag tat gaa cca ttg agg aaa 4764 
Cys Asn Leu Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Pro Leu Arg Lys 
1075 1080 1085 



35 gaa tta tea aaa tec aaa tea aat gca gtg tec etc ttg tct cac ttg 4812 
Glu Leu Ser Lys Ser Lys Ser Asn Ala Val Ser Leu Leu Ser His Leu 
1090 1095 1100 

tgg eta gca aat aca gca gca gcc ttg aga ata aat aca gat ttg ctg 4860 
40 Trp Leu Ala Asn Thr Ala Ala Ala Leu Arg He Asn Thr Asp Leu Leu 
1105 1110 1115 

cct act age tac cag ctg gtc aag ttt gaa gat att gtg cat ttt cct 4908 
Pro Thr Ser Tyr Gin Leu Val Lys Phe Glu Asp He Val His Phe Pro 
45 1120 1125 1130 

cag aaa act act gaa agg att ttt gcc ttt ctt gga att cct ttg tct 4956 
Gin Lys Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser 
1135 1140 1145 1150 



cct get agt tta aac caa ata ttg ttt gcc ace tct aca aac ctt ttt 5004 
Pro Ala Ser Leu Asn Gin He Leu Phe Ala Thr Ser Thr Asn Leu Phe 
1155 1160 1165 



55 tac ctt ccc tat gaa ggg gaa ata tea cca act aat act aat gtt tgg 5052 
Tyr Leu Pro Tyr Glu Gly Glu He Ser Pro Thr Asn Thr Asn Val Trp 
1170 1175 1180 

aaa cag aac ttg cct aga gat gaa att aaa eta att gaa aac ate tgc 5100 
60 Lys Gin Asn Leu Pro Arg Asp Glu He Lys Leu He Glu Asn He Cys 
1185 1190 1195 

tgg act ctg atg gat cgc eta gga tat cca aag ttt atg gac 5142 
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Trp Thr Leu Met Asp Arg Leu Gly 1 
1200 1205 

taaatgctgc aggtcagcag aaatttgcac 

5 

atgaatcaga agagtttgtt tattctttag 
ttcagtgttg tttgcacaga gagattgttt 
10 gatttatttt tatgtcatca cctcccttgc 
agtttctgct acagagtggt agatgaagtt 
tttaagtttt tgtctaactc cccttcatct 

15 

aacactgcta aaggccttgc aattgctgct 
actacaaaag ttccttatcc ttttgaaaag 
20 aaaagaaaat ttcttttact gtgtttaatg 
gctcctatca gaactatagg atttcttctg 
tgtttttttg aggtcggaaa ctgactttaa 

25 

ataaggaata agtctttgaa caatctgggt 
attgtctgtt taaaactctc ctttcacttt 
30 atcacatcac tcccatccta tcctttctgt 
aatcgctgaa ctctcaatat tgtggggcat 
agactgacac agacttagaa tcaaatttat 

35 

gtagtgccac tgcagtgtct ttttaaactg 
cttgcatccc tttgaatgag tttacagact 
40 ggagatgatg tcagaggcat ctgtttcctt 
gtcataaagt gtggtttatt ttattttggt 
taagccagtg gagtaattac aatgtattgg 

45 

tgaaaatctc tgtacagatt gcagtcttct 
gctctctaac acttggaagt ctgtcattct 
50 gttattatta tgtcaaaatg tgcctccaga 
gctaaaccta acttggctgt catttttctt 
ctccccaaca tattccttcc catatctctc 

55 

tgaaattcat gtgaatgtag gttgagaggg 
tcaagaaaga ttattcattc tatctcagag 
60 ataatatcta catgaatatt gcatgctaca 
agaatctcag tgtttacttt caattcctag 



Pro Lys Phe Met Asp 
1210 



taataatact 


taccaaccca 


ctttgtggat 


5202 


tgtgtgtgtg 


tgtgtgcacg 


cgtgtatgtg 


5262 


taaaaaatgg 


caeca tattt 


ggcctagcag 


5322 


ctttgtttct 


gaaaattttg 


tetgetaaaa 


5382 


atatcatggg 


gtcaggggag 


atgggaaaat 


5442 


gtaactgtgc 


taatctatct 


agagacctca 


5502 


ttacccacgc 


atetcttget 


ttcaagatgg 


5562 


gtcttctgac 


acacttatct 


tgcacaaaga 


5622 


ttcagtgata 


tcactgagga 


aatggtgaaa 


5682 


ggaaatacag 


atggaaatac 


agaatgaata 


5742 


aagcctcctt 


gaagtttttt 


acttagaaat 


5802 


ggcaagggct 


ggtagattat 


tttagacatg 


5862 


ttatcctccc 


tggagctaca 


gctgttcgcc 


5922 


cactgtcaag 


caaaacaatc 


agtagttact 


5982 


tttcccccca 


gttgattaat 


tttgcgttaa 


6042 


ttttctggaa 


ttaacactct 


gtgactcaaa 


6102 


gaaacagaat 


tggaaaactg 


cctgacttat 


6162 


gccagtgtct 


gcaaaagttg 


aaagcaaatg 


6222 


taccatctgc 


atcttattat 


aaatgtagtc 


6282 


aggctctgaa 


ateaaaatge 


tacgecatta 


6342 


atgaaaacat 


aaggcagtgt 


ggagacttga 


6402 


tcctgatgtt 


tcaaactgtg 


gttcccccaa 


6462 


gacctagata 


aaagtggttc 


tttctcagta 


6522 


gtgataaagc 


tctgtatatg 


ttagattcca 


6582 


ccattatagt 


gtgagtggag 


actgcccccc 


6642 


atgattgtcc 


ctctgtaatt 


tcaaaatgaa 


6702 


cactgaagac 


ctgaatctac 


actagtaatc 


6762 


ttaccggcaa 


gcatataaaa 


tgctacttgg 


6822 


tggttgataa 


cactatttcc 


attattgggc 


6882 


gatatgtgat 


cgtgaatcag 


atcacatata 


6942 
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aaaagtctgg attgtcagta gtattagatc 
ggtagcaagc aagaaagcag aaactactgt 
5 cagaaatgta cctgttggcg gccgggtgca 
ggaggccgag gcgggtggat cacgaggtca 
gaaaccccgt ctctactaaa aaaaaagtac 

10 

ctgtagtccc agctacacgg gaggctgagg 
gcttgcagtg agtggagatg cgccactgca 
IS cctcaaaaaa aaaaaaaaaa aagaaaaaaa 
atggagtatg tggagtaata gggaaagaag 
acactgagaa tgaatatggg aacacgtcat 

20 

aaatgatctt tacaatgtat cccagctttc 
acttgtttct gtactcacct actcccaccc 
25 atttagttat tctaaaatag aaagtttgct 
taaggaatgt tatgtatggg tgagcgggtg 
ggtgcctgag acccctgcct tagaaacaga 

30 

ttcccaggtc ctgcaccctt agggtgatct 
ttttgttctt tgaaataatt aaaagaggga 
35 acttacaaaa tttcttatct gtaaaatccg 
ggcctaattg agataattgc ttttattata 
aggacttatt aattttgctg acaaaagtga 

40 

aaattttcaa acaacataga tttactcaag 
atttaatgaa aactcagagg aaataggaaa 
45 atagatttag tttgtagaat ttaatttaaa 
aatatgtacg tttaggtgtg gacaccaaaa 
agaataacta ataaatgcct gacaagaatg 

50 

ggtcttgcta cattgccctc tgcaaatgta 
tttggttggc ataattgttc agcaacgatt 
55 ttggtataca aagtatatca caattttaag 
gctttgtttg gcttgattaa ctttgatcag 
gaaaaagcat aaatagaatg cagtataaca 

60 

cttttttttt tttttttttt ttttggggat 
gcagtggtct gatctcggct cactgcaacc 



tgatcaaggt 


aggaattaca 


attgcatgca 


7002 


tccctttatt 


ttaacattgt 


acagacaata 


7062 


gtggctcacg 


cctgtaatcc 


cagcacttcg 


7122 


ggagatcaag 


accatcctgg 


ctaacacggt 


7182 


aaaaaattag 


ccgggcgtgg 


tggcgggcac 


7242 


caggagaatg 


gcatgaacct 


gggaggcaga 


7302 


ctccagcctg 


ggcgacagag 


cgagactccg 


7362 


gaaatgtacc 


tgttggcagg 


agaaggccag 


7422 


agttacagaa 


aatgaaaaag 


aaaatgagtt 


7482 


tgatagcaaa 


agaaaggtac 


aggcttacga 


7542 


acccccacat 


ggcaatgcag 


agttgtattt 


7602 


caagggaaga 


ttttagacat 


gaaccctact 


7662 


ggagaaagcg 


tctactcaca 


gattgttctg 


7722 


acacatccat 


tgggtatgta 


tgcatgtgat 


7782 


attcctaagg 


ggattgactc 


tcccagcatg 


7842 


aggaaaattt 


taaatagctt 


ctactcttat 


7902 


ttatcactat 


ctgatacttc 


tgaaagaaac 


7962 


tctttttcta 


cattaacttc 


cccaaacata 


8022 


ataataggat 


tgaaatttta 


aaattttgaa 


8082 


agtaacaaat 


ataatgataa 


ttggcttttt 


8142 


atgaaataaa 


aaggccatat 


tcagagttga 


8202 


atctgctcag 


gagaaagaag 


ctaaatctgc 


8262 


atttaaattt 


taacaaagtg 


atgacacaac 


8322 


tattagacat 


ttgattgtcc 


ttttacatag 


8382 


ggacaatcct 


tccttgtatc 


aaaattccca 


8442 


ttcaaagaag 


aacctcctcc 


accacttact 


8502 


tctgtacatc 


accaagtatc 


tttggcattc 


8562 


tgagtaaata 


ttaatgataa 


tttttgaatt 


8622 


aaatagaaac 


gttttcattt 


gttgatttag 


8682 


ccacttccaa 


aggtaaggat 


acctaacatt 


8742 


ggagtctcac 


tttgttgccc 


aggctggagt 


8802 


tccgcctacc 


gggttcaagt 


gattctccta 


8862 
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50 



cgtcagcctc 


ctgaatagct 


gggattacag 


gtgcacgcca 


ccatgcttgg 


ctcatttttg 


8922 


tatttttagt 


agtgacagcg 


tttcaccaca 


ttggtcaggc 


tggntctcaa 


tctcttgacc 


8982 


tggtgatctg 


cccacctggg 


cctcccaaaa 


tgctgggatt 


acaggcatga 


gccaccacac 


9042 


ctggcaaggg 


tacctgacat 


tctaagatat 


caagacactt 


aatatgtggg 


ctattagctg 


9102 


cttatttaaa 


tgttgaccaa 


attgtctgat 


atatctgatt 


aatcatgatt 


tcacttcatt 


9162 


tcggaagaaa 


aattatccat 


atcattttta 


aagacgcaaa 


tgactttgga 


tttttgcata 


9222 


gagtacaata 


gacacttcaa 


acaatagatt 


ctaacattct 


ctgaaacact 


tgagatgttt 


9282 


gagctaccat 


ttatatgggt 


tatttatatt 


tagtctaagt 


aacacataca 


tgtttaattg 


9342 


attctgtttt 


catggataga 


ttcaactaag 


tcttccaagc 


aattaatttt 


ttgttcgtcg 


9402 


tcgtttttyc 


ttcatacgtt 


atctagttat 


gcagcactgg 


aaacagactg 


aagatcataa 


9462 


accagtttta 


tcagacctat 


gtgtaataag 


actcctgtta 


atacaaaaat 


aaaaagctaa 


9522 


aagcaa 












9528 


<210> 2 
<211> 1212 
<212> PRT 
<213> Homo 


sapiens 













10 



15 



20 



25 



30 

<220> 

<221> Amino acid sequence encoding Human NCAG1 protein" 
35 <400> 2 

Met Ala Leu Met Phe Thr Gly His Leu Leu Phe Leu Ala Leu Leu Met 
15 10 15 

Phe Ala Phe Ser Thr Phe Glu Glu Ser Val Ser Asn Tyr Ser Glu Trp 
40 20 25 30 

Ala Val Phe Thr Asp Asp lie Asp Gin Phe Lys Thr Gin Lys Val Gin 
35 40 45 

45 Asp Phe Arg Pro Asn Gin Lys Leu Lys Lys Ser Met Leu His Pro Ser 
50 55 60 



Leu Tyr Phe Asp Ala Gly Glu lie Gin Ala Met Arg Gin Lys Ser Arg 
65 70 75 80 

Ala Ser His Leu His Leu Phe Arg Ala lie Arg Ser Ala Val Thr Val 
85 90 95 



Met Leu Ser Asn Pro Thr Tyr Tyr Leu Pro Pro Pro Lys His Ala Asp 
55 100 105 110 

Phe Ala Ala Lys Trp Asn Glu He Tyr Gly Asn Asn Leu Pro Pro Leu 
115 120 125 

60 Ala Leu Tyr Cys Leu Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe 
130 135 140 

Val Leu Glu Tyr Met Asp Arg Met Val Gly Tyr Lys Asp Trp Leu Val 
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145 



150 



155 



160 



Glu Asn Ala Pro Gly Asp Glu Val Pro lie Gly His Ser Leu Thr Gly 
165 170 175 

5 

Phe Ala Thr Ala Phe Asp Phe Leu Tyr Asn Leu Leu Asp Asn His Arg 
180 185 190 

Arg Gin Lys Tyr Leu Glu Lys lie Trp Val lie Thr Glu Glu Met Tyr 
10 195 200 205 



15 



Glu Tyr Ser Lys Val Arg Ser Trp Gly Lys Gin Leu Leu His Asn His 
210 215 220 

Gin Ala Thr Asn Met He Ala Leu Leu Thr Gly Ala Leu Val Thr Gly 

225 230 235 240 



20 



Val Asp Lys Gly Ser Lys Ala Asn He Trp Lys Gin Ala Val Val Asp 
245 250 255 

Val Met Glu Lys Thr Met Phe Leu Leu Asn His lie Val Asp Gly Ser 
260 265 270 



Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ala Lys Ser Val Thr 
25 275 280 285 



30 



Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn He Asn Asn Leu Asp 
290 295 300 

Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu 
305 310 315 320 



35 



Pro Gly Phe Gin Arg Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp 
325 330 335 

Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu 
340 345 350 



Lys Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin He Arg Lys His Arg 
40 355 360 365 



45 



Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr 

370 375 380 

Leu His Thr Glu Tyr He Trp Tyr Asp Pro Gin Leu Thr Pro Gin Pro 
385 390 395 400 



50 



Pro Ala Asp Tyr Gly Thr Ala Lys He His Thr Phe Pro Asn Trp Gly 
405 410 415 

Val Val Thr Tyr Gly Ala Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe 
420 425 430 



Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He 
55 435 440 445 



60 



Val His Phe Gin Pro Tyr Ser Trp He Asp Gly Trp Arg Ser Phe Asn 
450 455 460 

Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn 
465 470 475 480 



Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His 



WO 02/101044 



11/22 



PCT/EP02/06316 



485 490 495 

Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn 
500 505 510 

5 

Lys Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp 
515 520 525 

Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu lie lie Thr Ala Ser 
10 530 535 540 

Gin His Gly Glu Met Val Phe Val Ser Gly Glu Ala Val Ser Ala Tyr 
545 550 555 560 

15 Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu 

565 570 575 



20 



35 



50 



Asn Ser Gin Thr Leu Leu Val Val Asp His lie Glu Arg Gin Glu Asp 
580 585 590 

Ser Pro lie Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp lie Asp 
595 600 605 



Phe Lys Tyr lie Pro Tyr Lys Phe Met Asn Arg Tyr Asn Gly Ala Met 

25 610 615 620 

Met Asp Val Trp Asp Ala His Tyr Lys Met Phe Trp Phe Asp His His 

625 630 635 640 

30 Gly Asn Ser Pro Met Ala Ser lie Gin Glu Ala Glu Gin Ala Ala Glu 

645 650 655 



Phe Lys Lys Arg Trp Thr Gin Phe Val Asn Val Thr Phe Gin Met Glu 
660 665 670 

Pro Thr lie Thr Arg lie Ala Tyr Val Phe Tyr Gly Pro Tyr lie Asn 
675 680 685 



Val Ser Ser Cys Arg Phe lie Asp Ser Ser Asn Pro Gly Leu Gin He 
40 690 695 700 

Ser Leu Asn Val Asn Asn Thr Glu His Val Val Ser He Val Thr Asp 
705 710 715 720 

45 Tyr His Asn Leu Lys Thr Arg Phe Asn Tyr Leu Gly Phe Gly Gly Phe 

725 730 735 



Ala Ser Val Ala Asp Gin Gly Gin He Thr Arg Phe Gly Leu Gly Thr 
740 745 750 

Gin Ala He Val Lys Pro Val Arg His Asp Arg He He Phe Pro Phe 
755 760 765 



Gly Phe Lys Phe Asn He Ala Val Gly Leu He Leu Cys He Ser Leu 
55 770 775 780 

Val He Leu Thr Phe Gin Trp Arg Phe Tyr Leu Ser Phe Arg Lys Leu 
785 790 795 800 

60 Met Arg Trp He Leu He Leu Val He Ala Leu Trp Phe He Glu Leu 

805 810 815 

Leu Asp Val Trp Ser Thr Cys Ser Gin Pro lie Cys Ala Lys Trp Thr 
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820 825 830 

Arg Thr Glu Ala Glu Gly Ser Lys Lys Ser Leu Ser Ser Glu Gly His 
835 840 845 

5 

His Met Asp Leu Pro Asp Val Val He Thr Ser Leu Pro Gly Ser Gly 
850 855 860 

Ala Glu He Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr 
10 865 870 875 880 

He Arg Val Pro Thr Ala Tyr He Asp He Pro Glu Thr Glu Leu Glu 
885 890 895 

15 He Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp He Arg 
900 905 910 



20 



Ser Gly His Phe Arg Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin 
915 920 925 

Asp Thr Lys Leu His Leu Gin Asn lie His Leu His Glu Pro Asn Arg 
930 935 940 



Gly Lys Leu Ala Gin Tyr Phe Ala Met Asn Lys Asp Lys Lys Arg Lys 
25 945 950 955 960 

Phe Lys Arg Arg Glu Ser Leu Pro Glu Gin Arg Ser Gin Met Lys Gly 
965 970 975 

30 Ala Phe Asp Arg Asp Ala Glu Tyr He Arg Ala Leu Arg Arg His Leu 
980 985 990 



35 



Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser 
995 1000 1005 

Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Ala Ser Met 
1010 1015 1020 



Arg Ala Leu Tyr He Val Arg Asp Pro Arg Ala Trp He Tyr Ser Met 
40 025 1030 1035 1040 

Leu Tyr Asn Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu 
1045 1050 1055 

45 His Leu Ala Lys Leu Phe Lys He Glu Gly Gly Lys Gly Lys Cys Asn 
1060 1065 1070 



50 



Leu Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Pro Leu Arg Lys Glu Leu 
1075 1080 1085 

Ser Lys Ser Lys Ser Asn Ala Val Ser Leu Leu Ser His Leu Trp Leu 
1090 1095 1100 



Ala Asn Thr Ala Ala Ala Leu Arg He Asn Thr Asp Leu Leu Pro Thr 

55 105 1110 1115 1120 

Ser Tyr Gin Leu Val Lys Phe Glu Asp He Val His Phe Pro Gin Lys 

1125 1130 1135 

60 Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala 
1140 1145 1150 

Ser Leu Asn Gin He Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu 
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5 



25 



30 



35 



40 



55 



1155 1160 1165 

Pro Tyr Glu Gly Glu lie Ser Pro Thr Asn Thr Asn Val Trp Lys Gin 
1170 1175 1180 

Asn Leu Pro Arg Asp Glu lie Lys Leu lie Glu Asn lie Cys Trp Thr 
185 1190 1195 1200 



Leu Met Asp Arg Leu Gly Tyr Pro Lys Phe Met Asp 
10 1205 1210 



<210> 3 
15 <211> 5092 
<212> DNA 
<213> Mus sp. 

<220> 

20 <221> CDS encoding the mouse NCAG1 protein 
<222> (501) . . (4121) 



<400> 3 



tctgagaatg 


acagtacttt 


atcatcttct 


tttggggaac 


atacagaaac 


ataccattta 


60 


tgtgtggtaa 


gttaatcact 


acagatggtt 


tcttgtgcta 


cgtggtcaaa 


tggcttcatt 


120 


tgaattttgg 


aattttaaaa 


aattttttct 


ttttcacatg 


ttaattagat 


ttacacacag 


180 


ggagtaaatg 


ttggatttgt 


tgtattttct 


gactagacca 


ctgttttctg 


tgcattggag 


240 


acattggagg 


cattaatatt 


ccttgaaatt 


ttattttatt 


ggaagcaaac 


ctgtgccagg 


300 


gacacagaca 


tgctatataa 


tttcctaact 


tttcttgett 


tgaataagct 


gaatgtcacc 


360 


tggatttcac 


agcctatgag 


gtatagtctg 


ttttttgttt 


ttgttttttt 


gctacatctt 


420 


taatatataa 


tttacaataa 


ccagatggga 


aacactgtgc 


ttaacacata 


tgcctaagga 


480 


aaagatcttc 


cccatggatc 


atg gcg ttt atg ttt aca gaa cat tta eta ttt 
Met Ala Phe Met Phe Thr Glu His Leu Leu Phe 


533 



10 



tta aca ttg atg atg tgt agt ttt tct act tgt gaa gaa tct gtg age 581 
45 Leu Thr Leu Met Met Cys Ser Phe Ser Thr Cys Glu Glu Ser Val Ser 
15 20 25 

aat tat tct gaa tgg gca gtt ttc aca gac gat ata caa tgg ctt aag 629 
Asn Tyr Ser Glu Trp Ala Val Phe Thr Asp Asp lie Gin Trp Leu Lys 
50 30 35 40 

tea cag aaa ata caa gat ttc aaa etc aac cga aga ctt cat cca aat 677 
Ser Gin Lys lie Gin Asp Phe Lys Leu Asn Arg Arg Leu His Pro Asn 
45 50 55 



tta tat ttt gat get gga gat ata caa aca ttg aaa caa aag tct cgt 725 
Leu Tyr Phe Asp Ala Gly Asp lie Gin Thr Leu Lys Gin Lys Ser Arg 
60 65 70 75 



60 aca age cat ttg cat att ttt aga get ate aaa agt gca gtg aca att 773 
Thr Ser His Leu His lie Phe Arg Ala lie Lys Ser Ala Val Thr lie 
80 85 90 
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atg ctg tec aat cca tea tac tac eta cct cca ccc aag cat get gag 821 

Met Leu Ser Asn Pro Ser Tyr Tyr Leu Pro Pro Pro Lys His Ala Glu 

95 100 105 

5 ttt get gee aag tgg aat gaa att tat ggt aat aat ctt cct cct tta 869 

Phe Ala Ala Lys Trp Asn Glu He Tyr Gly Asn Asn Leu Pro Pro Leu 
110 115 120 

gca ttg tat tgt tta tta tgc cca gaa gac aag gtt gec ttt gaa ttt 917 

10 Ala Leu Tyr Cys Leu Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe 

125 130 135 

gtt atg gaa tac atg gat egg atg gtt age tac aaa gac tgg eta gtt 965 
Val Met Glu Tyr Met Asp Arg Met Val Ser Tyr Lys Asp Trp Leu Val 
15 140 145 150 155 

gag aat gca cca ggg gat gag gtt cca gtt ggc cat tct tta aca ggt 1013 
Glu Asn Ala Pro Gly Asp Glu Val Pro Val Gly His Ser Leu Thr Gly 
160 165 170 



20 



40 



60 



ttt gee act gec ttt gac ttt tta tat aat eta tta ggt aat cag cgt 1061 
Phe Ala Thr Ala Phe Asp Phe Leu Tyr Asn Leu Leu Gly Asn Gin Arg 
175 180 185 



25 aaa caa aaa tac eta gaa aaa att tgg att gtt act gag gaa atg tat 1109 
Lys Gin Lys Tyr Leu Glu Lys He Trp He Val Thr Glu Glu Met Tyr 
190 195 200 

gaa tat tec aag att cga tea tgg ggc aaa caa ctt ctt cat aac cat 1157 
30 Glu Tyr Ser Lys He Arg Ser Trp Gly Lys Gin Leu Leu His Asn His 
205 210 215 

caa get aca aat atg ata get tta etc ata ggg gec ttg gtt act gga 1205 
Gin Ala Thr Asn Met He Ala Leu Leu He Gly Ala Leu Val Thr Gly 
35 220 225 230 235 

gta gat aaa gga tct aaa gca aac ata tgg aaa caa gtt gtt gtt gat 1253 
Val Asp Lys Gly Ser Lys Ala Asn He Trp Lys Gin Val Val Val Asp 
240 245 250 



gtg atg gaa aag act atg ttt etc ttg aag cat att gta gat ggc tea 1301 
Val Met Glu Lys Thr Met Phe Leu Leu Lys His He Val Asp Gly Ser 
255 260 265 



45 ttg gat gaa ggt gtg gee tat gga age tat acc tea aaa tea gtt aca 1349 

Leu Asp Glu Gly Val Ala Tyr Gly Ser Tyr Thr Ser Lys Ser Val Thr 

270 275 280 

cag tat gtt ttt ttg gca caa cgc cat ttt aac ate aac aac ttt gat 1397 

50 Gin Tyr Val Phe Leu Ala Gin Arg His Phe Asn He Asn Asn Phe Asp 

285 290 295 

aat aac tgg eta aaa atg cat ttt tgg ttt tat tat get aca ctt ttg 1445 

Asn Asn Trp Leu Lys Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu 
55 300 305 310 315 

cca ggc tat caa aga act gta ggc ata gca gat tec aat tat aat tgg 1493 

Pro Gly Tyr Gin Arg Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp 

320 325 330 



ttt tat ggt cca gag age cag eta gtt ttc ttg gat aag ttc att tta 1541 
Phe Tyr Gly Pro Glu Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu 
335 340 345 
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cag aat gga get gga aat tgg tta get cag caa att aga aag cat cga 1589 

Gin Asn Gly Ala Gly Asn Trp Leu Ala Gin Gin He Arg Lys His Arg 
350 355 360 

5 

cct aag gat gga cca atg gtt cct tec act get cag egg tgg agt act 1637 

Pro Lys Asp Gly Pro Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr 
365 370 375 

10 ctt cat act gaa tac ate tgg tat gat cca aca etc acc cca cag cct 1685 

Leu His Thr Glu Tyr He Trp Tyr Asp Pro Thr Leu Thr Pro Gin Pro 

380 385 390 395 

cct gtt gat ttt ggc act gca aaa atg cac aca ttt cct aac tgg ggt 1733 

15 Pro Val Asp Phe Gly Thr Ala Lys Met His Thr Phe Pro Asn Trp Gly 

400 405 410 

gtc gtg act tat ggg ggt ggg ctg cca aac acc cag acc aat acc ttt 1781 

Val Val Thr Tyr Gly Gly Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe 
20 415 420 425 

gtg tct ttt aaa tct ggg aaa ctg gga gga cga get gtg tat gac ata 1829 

Val Ser Phe Lys Ser Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He 
430 435 440 



25 



45 



gtt cac ttt cag cca tat tec tgg att gat gga tgg aga age ttt aac 1877 
Val His Phe Gin Pro Tyr Ser Trp He Asp Gly Trp Arg Ser Phe Asn 
445 450 455 



30 cca gga cat gaa cat cca gat caa aat tea ttt act ttc get cct aat 1925 

Pro Gly His Glu His Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn 

460 465 470 475 

ggg cag gta ttc gtt tct gag get ctt tat gga cca aaa ttg age cac 1973 

35 Gly Gin Val Phe Val Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His 

480 485 490 

ctt aac aac gta ttg gtg ttt gec cca tea cca tea agt caa tgt aat 2021 

Leu Asn Asn Val Leu Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn 
40 495 500 505 

cag ccc tgg gaa ggt caa ctg gga gaa tgt gca cag tgg etc aag tgg 2069 

Gin Pro Trp Glu Gly Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp 
510 515 520 



act ggg gaa gag gtt ggt gat gca get ggg gaa gtt att act get get 2117 
Thr Gly Glu Glu Val Gly Asp Ala Ala Gly Glu Val He Thr Ala Ala 
525 530 535 



50 caa cat ggt gat agg atg ttt gtg agt ggg gaa gca gtg tct get tat 2165 
Gin His Gly Asp Arg Met Phe Val Ser Gly Glu Ala Val Ser Ala Tyr 
540 545 550 555 

tct tct gee atg aga ctg aaa agt gtc tat cgt get tta ctt ctt tta 2213 
55 Ser Ser Ala Met Arg Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu 

560 565 570 

aat tea caa act ctg ctt gtt gtc gat cat att gaa agg caa gaa act 2261 
Asn Ser Gin Thr Leu Leu Val Val Asp His He Glu Arg Gin Glu Thr 
60 575 580 585 

tec cca ata aat tct gtc agt gee ttc ttt cat aat ttg gat att gat 2309 
Ser Pro He Asn Ser Val Ser Ala Phe Phe His Asn Leu Asp He Asp 
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590 595 600 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



ttt 

Phe 


aaa 
Lys 
605 


tac 
Tyr 


ate 
He 


cca 
Pro 


tac 
Tyr 


aag 
Lys 
610 


ttt 
Phe 


atg 
Met 


aat 
Asn 


aga 
Arg 


tat aat 
Tyr Asn 
615 


ggt 
Gly 


gec 
Ala 


atg 
Met 


2357 


atg 
Met 
620 


gat gtg 
Asp Val 


tgg 
Trp 


gat 
Asp 


gca 
Ala 
625 


cac 
His 


tat 
Tyr 


aaa 
Lys 


atg 
Met 


ttt 
Phe 
630 


tgg 
Trp 


ttt 
Phe 


gat 
Asp 


cac 
His 


cat 
His 
635 


2405 


ggc 
Gly 


aac 
Asn 


agt 
Ser 


cct 
Pro 


gtg 
Val 
640 


get 
Ala 


aat 
Asn 


ata 
He 


cag 
Gin 


gaa 
Glu 
645 


gca 
Ala 


gaa 
Glu 


cag 
Gin 


get 
Ala 


get 
Ala 
650 


gaa 
Glu 


2453 


ttt 
Phe 


aag 
Lys 


aaa 
Lys 


egg 
Arg 
655 


tgg 
Trp 


aca 
Thr 


cag 
Gin 


ttt 
Phe 


gtt 

Val 
660 


aat 
Asn 


gtt 
Val 


aca 
Thr 


ttt 
Phe 


cat 
His 
665 


atg 
Met 


gaa 
Glu 


2501 


tec 
Ser 


aca 
Thr 


ate 
He 
670 


aca 
Thr 


aga 
Arg 


att 
He 


get 
Ala 


tat 
Tyr 
675 


gta 
Val 


ttt 
Phe 


tat ggg 
Tyr Gly 


cca 
Pro 
680 


tat 
Tyr 


gtc 
Val 


aat 
Asn 


2549 


gtt 

Val 


tec 
Ser 
685 


age 
Ser 


tgc 
Cys 


aga 
Arg 


ttt 
Phe 


att 
He 
690 


gat 
Asp 


agt 
Ser 


tec 
Ser 


agt 
Ser 


tct gga 
Ser Gly 
695 


ctt 
Leu 


cag 
Gin 


att 
He 


2597 


tct 
Ser 
700 


tta 
Leu 


cat 
His 


gtc 
Val 


aac 
Asn 


agt 
Ser 
705 


act 
Thr 


gaa 
Glu 


cat 
His 


agt 
Ser 


gtg 
Val 
710 


tct 
Ser 


gtt 
Val 


gta 
Val 


act 
Thr 


gac 
Asp 
715 


2645 


tat 
Tyr 


caa 
Gin 


aac 
Asn 


ctt 
Leu 


aaa 
Lys 
720 


age 
Ser 


aga 
Arg 


ttc 
Phe 


agt 
Ser 


tac 
Tyr 
725 


ctg gga 
Leu Gly 


ttt 
Phe 


ggt 
Gly 


ggt 

Gly 
730 


ttt 
Phe 


2693 


gec 
Ala 


agt 
Ser 


gtg 
Val 


get 
Ala 
735 


aat 
Asn 


caa 
Gin 


gga 
Gly 


cag 
Gin 


ata 
He 
740 


acc 
Thr 


aga 
Arg 


ttt ggt 
Phe Gly 


ttg 
Leu 
745 


ggt 
Gly 


act 
Thr 


2741 


caa 
Gin 


gaa 
Glu 


ata 
He 
750 


gta 
Val 


aac 
Asn 


cct 
Pro 


gta 
Val 


aga 
Arg 
755 


cat 
His 


gat 
Asp 


aaa gtt aat 
Lys Val Asn 
760 


ttc 
Phe 


ccc 
Pro 


ttt 
Phe 


2789 


ggg 

Gly 


ttt 
Phe 
765 


aaa 
Lys 


ttt 
Phe 


aat 
Asn 


ata 
He 


gca 
Ala 
770 


gtt 

Val 


gga ttc 
Gly Phe 


att 
He 


ttg 
Leu 
775 


tgt 
Cys 


att 
He 


agt 
Ser 


ttg 
Leu 


2837 


gtt 
Val 
780 


att 
lie 


tta 
Leu 


act 
Thr 


ttt 
Phe 


caa 
Gin 
785 


tgg 
Trp 


egg 
Arg 


ttt 
Phe 


tac 
Tyr 


ctt 
Leu 
790 


tec 
Ser 


ttt 
Phe 


aga 
Arg 


aag 
Lys 


eta 
Leu 
795 


2885 


atg 
Met 


cgc 
Arg 


tgt 
Cys 


gta 
Val 


tta 
Leu 
800 


ata 
He 


ctt 
Leu 


gtt 
Val 


att 
He 


gee 
Ala 
805 


ttg 
Leu 


tgg 
Trp 


ttt 
Phe 


att 
He 


gag 
Glu 
810 


ctt 
Leu 


2933 


ctg 
Leu 


gat gta 
Asp Val 


tgg 
Trp 
815 


agt 
Ser 


aca 
Thr 


tgc 
Cys 


act 
Thr 


cag 
Gin 
820 


ccc 
Pro 


ate tgt gca 
He Cys Ala 


aaa 
Lys 
825 


tgg 
Trp 


aca 
Thr 


2981 


agg 
Arg 


act 
Thr 


gaa 
Glu 
830 


get 
Ala 


aag 
Lys 


gca 
Ala 


aat 
Asn 


gag 
Glu 
835 


aag gtc 
Lys Val 


atg 
Met 


att 
lie 


tct 
Ser 
840 


gaa 
Glu 


ggg 

Gly 


cat 
His 


3029 


cat 


gtg 


gat 


ctt 


cct 


aat 


gtt 


att 


att 


acc 


tea 


etc 


cct 


ggt 


tea 


gga 


3077 
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His Val Asp Leu Pro Asn Val He He Thr Ser Leu Pro Gly Ser Gly 
845 850 855 



get gaa att etc aaa cag ctt ttt ttc aac age agt gat ttt etc tac 
Ala Glu He Leu Lys Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr 
860 865 870 875 



3125 



10 



ate aga att cct aca gee tac atg gat ate cct gaa act gaa ttt gaa 
He Arg He Pro Thr Ala Tyr Met Asp He Pro Glu Thr Glu Phe Glu 
880 885 890 



3173 



15 



20 



att gac tea ttt gta gat get tgt gag tgg aaa gta tea gat ate cgc 3221 
He Asp Ser Phe Val Asp Ala Cys Glu Trp Lys Val Ser Asp He Arg 
895 900 905 

agt ggg cac ttt cat ctt ctt cga ggg tgg ctg cag tct ttg gtc cag 3269 
Ser Gly His Phe His Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin 
910 915 920 

gat aca aaa ctt cac ttg caa aac ate cat eta cat gaa ace agt agg 3317 
Asp Thr Lys Leu His Leu Gin Asn He His Leu His Glu Thr Ser Arg 
925 930 935 



agt aaa ctg gee caa tat ttt aca act aat aag gac aaa aag cga aaa 
25 Ser Lys Leu Ala Gin Tyr Phe Thr Thr Asn Lys Asp Lys Lys Arg Lys 
940 945 950 955 



3365 



30 



tta aaa aga agg gag tct ttg caa gat caa aga agt aga ata aaa gga 
Leu Lys Arg Arg Glu Ser Leu Gin Asp Gin Arg Ser Arg He Lys Gly 
960 965 970 



3413 



35 



cca ttt gat aga gat get gaa tat att agg get tta aga aga cac ctt 3461 
Pro Phe Asp Arg Asp Ala Glu Tyr He Arg Ala Leu Arg Arg His Leu 
975 980 985 

gtt tat tac cca agt gca cgt cct gtg etc age tta agt agt ggt age 3509 
Val Tyr Tyr Pro Ser Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser 
990 995 1000 



40 tgg aca ttg aag ctt cat ttt ttt cag gaa gtt tta gga act tea atg 
Trp Thr Leu Lys Leu His Phe Phe Gin Glu Val Leu Gly Thr Ser Met 
1005 1010 1015 



3557 



egg gca ttg tac ata gta aga gac cct cga get tgg ate tat tea gtg 
45 Arg Ala Leu Tyr He Val Arg Asp Pro Arg Ala Trp He Tyr Ser Val 
1020 1025 1030 1035 



3605 



50 



eta tat ggt agt aaa cca agt ctt tat tct ttg aag aat gta cca gag 
Leu Tyr Gly Ser Lys Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu 
1040 1045 1050 



3653 



55 



60 



cac tta gca aaa ttg ttt aaa ata gag gaa ggt aaa age aaa tgt aat 3701 
His Leu Ala Lys Leu Phe Lys He Glu Glu Gly Lys Ser Lys Cys Asn 
1055 1060 1065 

teg aat tct ggc tat get ttt gag tat gaa tea ctg aag aaa gaa tta 3749 
Ser Asn Ser Gly Tyr Ala Phe Glu Tyr Glu Ser Leu Lys Lys Glu Leu 
1070 1075 1080 

gaa ata tec caa tea aat get ate tec tta tta tct cat ttg tgg gta 3797 
Glu He Ser Gin Ser Asn Ala He Ser Leu Leu Ser His Leu Trp Val 
1085 1090 1095 
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gca aac act gca gca gcc ttg aga ata aat aca gat ttg ctg cct acc 3845 

Ala Asn Thr Ala Ala Ala Leu Arg lie Asn Thr Asp Leu Leu Pro Thr 
1100 1105 1110 1115 

5 aat tac cat ctg gtc aag ttt gaa gat att gtt cat ttt cct cag aag 3893 

Asn Tyr His Leu Val Lys Phe Glu Asp He Val His Phe Pro Gin Lys 
1120 1125 1130 

act act gaa agg att ttt get ttc ctt ggc att cct ttg tct cct get 3941 

10 Thr Thr Glu Arg He Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala 
1135 1140 1145 

agt tta aac caa atg eta ttt gcc act tec aca aac ctt ttt tat ctt 3989 

Ser Leu Asn Gin Met Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu 
15 1150 1155 1160 

cca tat gag ggg gaa ata tea cca tct aat act aat att tgg aaa aca 4037 

Pro Tyr Glu Gly Glu He Ser Pro Ser Asn Thr Asn He Trp Lys Thr 
1165 1170 1175 



20 



35 



40 



45 



50 



55 



60 



aac ttg cct aga gat gaa att aaa eta att gaa aac att tgc tgg aca 4085 
Asn Leu Pro Arg Asp Glu He Lys Leu He Glu Asn He Cys Trp Thr 
1180 1185 1190 1195 



25 ctg atg gat cat eta gga tat cca aag ttt atg gac taaatgctgc 4131 
Leu Met Asp His Leu Gly Tyr Pro Lys Phe Met Asp 



30 





1200 




1205 








aggteggcaa 


aatttgeact 


aatgtgtccc 


aacctacttt 


gtggatatga 


actagaaaac 


4191 


tttgtttatt 


cttgtacatg 


tatgtatgtg 


tgtagagtga 


gtgcgtgtgt 


ccagtatgtt 


4251 


atttgeacag 


agatattttc 


aaaataggca 


ccatatttgg 


cctagcagga 


tttattttta 


4311 


tgttaccact 


tttcttgect 


ttgtttctga 


atttttttct 


gctaaaatgt 


ttctgetaca 


4371 


gaggtatata 


ttctggggtt 


ctgaaatatg 


gggttttaat 


ggactttaac 


tcaacttctt 


4431 


tggaaactat 


ttatctatct 


taggacctca 


aacactacaa 


acggccttgc 


aattgetget 


4491 


gtatctagtc 


atctctcgct 


cttaatatgg 


actacaaaac 


tttatgtttt 


gaaaaegtet 


4551 


aacatttacc 


ttgcacacaa 


aaacgagaaa 


taaaaaaaca 


aaaattattt 


tacgttgtat 


4611 


agtgtttatt 


gaaatcactt 


ggtgaggctg 


gggggaggag 


cttatgataa 


agttccctta 


4671 


agaaactaga 


aaataaagat 


gaaaacatag 


aattaaggtt 


tttttgtttc 


tttcttcctt 


4731 


tttttttttt 


ttttgtacta 


agaaataaga 


ttgaacagtg 


gatactgaaa 


tttggtgaat 


4791 


tattttggaa 


gtgattctct 


catttgtctt 


tctgaagcta 


cagctgttca 


tcatcacact 


4851 


acccttaccc 


tgtctatcca 


ttctgtcatt 


gtcaccaaaa 


aaaaaaagtc 


agtaattact 


4911 


agctaeaaaa 


ctatctaaca 


agcccttctc 


tggatgattt 


actttgtgtt 


aaagacttac 


4971 


acagatttat 


aatcacattt 


agttgtgtgg 


cattaccaca 


atatgactca 


aagcaaaagc 


5031 


agacttctgt 


ctgttgtagt 


gtttttaagt 


gtgtgttgtg 


gggtggggga 


gggsrsdbac 


5091 


k 












5092 
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<210> 4 
<211> 1207 
<212> PRT 
<213> Mus sp. 

5 

<220> 

<221> Amino acid sequence encoding Mouse NCAG1 protein 
<400> 4 

10 Met Ala Phe Met Phe Thr Glu His Leu Leu Phe Leu Thr Leu Met Met 
15 10 15 

Cys Ser Phe Ser Thr Cys Glu Glu Ser Val Ser Asn Tyr Ser Glu Trp 
20 25 30 



15 



Ala Val Phe Thr Asp Asp lie Gin Trp Leu Lys Ser Gin Lys lie Gin 
35 40 45 



Asp Phe Lys Leu Asn Arg Arg Leu His Pro Asn Leu Tyr Phe Asp Ala 

20 50 55 60 

Gly Asp He Gin Thr Leu Lys Gin Lys Ser Arg Thr Ser His Leu His 

65 70 75 80 

25 He Phe Arg Ala He Lys Ser Ala Val Thr He Met Leu Ser Asn Pro 

85 90 95 



30 



Ser Tyr Tyr Leu Pro Pro Pro Lys His Ala Glu Phe Ala Ala Lys Trp 
100 105 110 

Asn Glu He Tyr Gly Asn Asn Leu Pro Pro Leu Ala Leu Tyr Cys Leu 
115 120 125 



Leu Cys Pro Glu Asp Lys Val Ala Phe Glu Phe Val Met Glu Tyr Met 
35 130 135 140 

Asp Arg Met Val Ser Tyr Lys Asp Trp Leu Val Glu Asn Ala Pro Gly 
145 150 155 160 

40 Asp Glu Val Pro Val Gly His Ser Leu Thr Gly Phe Ala Thr Ala Phe 

165 170 175 



45 



Asp Phe Leu Tyr Asn Leu Leu Gly Asn Gin Arg Lys Gin Lys Tyr Leu 
180 185 190 

Glu Lys He Trp He Val Thr Glu Glu Met Tyr Glu Tyr Ser Lys He 
195 200 205 



Arg Ser Trp Gly Lys Gin Leu Leu His Asn His Gin Ala Thr Asn Met 
50 210 215 220 

He Ala Leu Leu He Gly Ala Leu Val Thr Gly Val Asp Lys Gly Ser 
225 230 235 240 

55 Lys Ala Asn He Trp Lys Gin Val Val Val Asp Val Met Glu Lys Thr 

245 250 255 



60 



Met Phe Leu Leu Lys His He Val Asp Gly Ser Leu Asp Glu Gly Val 
260 265 270 

Ala Tyr Gly Ser Tyr Thr Ser Lys Ser Val Thr Gin Tyr Val Phe Leu 
275 280 285 
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Ala Gin Arg His Phe Asn He Asn Asn Phe Asp Asn Asn Trp Leu Lys 
290 295 300 

Met His Phe Trp Phe Tyr Tyr Ala Thr Leu Leu Pro Gly Tyr Gin Arg 
5 305 310 315 320 

Thr Val Gly He Ala Asp Ser Asn Tyr Asn Trp Phe Tyr Gly Pro Glu 
325 330 335 

10 Ser Gin Leu Val Phe Leu Asp Lys Phe He Leu Gin Asn Gly Ala Gly 
340 345 350 



15 



Asn Trp Leu Ala Gin Gin He Arg Lys His Arg Pro Lys Asp Gly Pro 
355 360 365 

Met Val Pro Ser Thr Ala Gin Arg Trp Ser Thr Leu His Thr Glu Tyr 
370 375 380 



He Trp Tyr Asp Pro Thr Leu Thr Pro Gin Pro Pro Val Asp Phe Gly 

20 385 390 395 400 

Thr Ala Lys Met His Thr Phe Pro Asn Trp Gly Val Val Thr Tyr Gly 

405 410 415 

25 Gly Gly Leu Pro Asn Thr Gin Thr Asn Thr Phe Val Ser Phe Lys Ser 

420 425 430 



30 



Gly Lys Leu Gly Gly Arg Ala Val Tyr Asp He Val His Phe Gin Pro 
435 440 445 

Tyr Ser Trp He Asp Gly Trp Arg Ser Phe Asn Pro Gly His Glu His 
450 455 460 



Pro Asp Gin Asn Ser Phe Thr Phe Ala Pro Asn Gly Gin Val Phe Val 
35 465 470 475 480 

Ser Glu Ala Leu Tyr Gly Pro Lys Leu Ser His Leu Asn Asn Val Leu 
485 490 495 

40 Val Phe Ala Pro Ser Pro Ser Ser Gin Cys Asn Gin Pro Trp Glu Gly 
500 505 510 



45 



Gin Leu Gly Glu Cys Ala Gin Trp Leu Lys Trp Thr Gly Glu Glu Val 
515 520 525 

Gly Asp Ala Ala Gly Glu Val He Thr Ala Ala Gin His Gly Asp Arg 
530 535 540 



Met Phe Val Ser Gly Glu Ala Val Ser Ala Tyr Ser Ser Ala Met Arg 
50 545 550 555 560 

Leu Lys Ser Val Tyr Arg Ala Leu Leu Leu Leu Asn Ser Gin Thr Leu 
565 570 575 

55 Leu Val Val Asp His He Glu Arg Gin Glu Thr Ser Pro He Asn Ser 
580 585 590 



60 



Val Ser Ala Phe Phe His Asn Leu Asp He Asp Phe Lys Tyr He Pro 
595 600 605 

Tyr Lys Phe Met Asn Arg Tyr Asn Gly Ala Met Met Asp Val Trp Asp 
610 615 620 
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Ala His Tyr Lys Met Phe Trp Phe Asp His His Gly Asn Ser Pro Val 
625 630 635 640 

Ala Asn lie Gin Glu Ala Glu Gin Ala Ala Glu Phe Lys Lys Arg Trp 
5 645 650 655 

Thr Gin Phe Val Asn Val Thr Phe His Met Glu Ser Thr lie Thr Arg 
660 665 670 

10 lie Ala Tyr Val Phe Tyr Gly Pro Tyr Val Asn Val Ser Ser Cys Arg 
675 680 685 



15 



Phe He Asp Ser Ser Ser Ser Gly Leu Gin He Ser Leu His Val Asn 

690 695 700 

Ser Thr Glu His Ser Val Ser Val Val Thr Asp Tyr Gin Asn Leu Lys 

705 710 715 720 



Ser Arg Phe Ser Tyr Leu Gly Phe Gly Gly Phe Ala Ser Val Ala Asn 
20 725 730 735 



25 



Gin Gly Gin He Thr Arg Phe Gly Leu Gly Thr Gin Glu He Val Asn 
740 745 750 

Pro Val Arg His Asp Lys Val Asn Phe Pro Phe Gly Phe Lys Phe Asn 
755 760 765 



30 



He Ala Val Gly Phe He Leu Cys lie Ser Leu Val lie Leu Thr Phe 

770 775 780 

Gin Trp Arg Phe Tyr Leu Ser Phe Arg Lys Leu Met Arg Cys Val Leu 

785 790 795 800 



lie Leu Val He Ala Leu Trp Phe He Glu Leu Leu Asp Val Trp Ser 
35 805 810 815 

Thr Cys Thr Gin Pro He Cys Ala Lys Trp Thr Arg Thr Glu Ala Lys 
820 825 830 

40 Ala Asn Glu Lys Val Met lie Ser Glu Gly His His Val Asp Leu Pro 
835 840 845 



45 



Asn Val He He Thr Ser Leu Pro Gly Ser Gly Ala Glu lie Leu Lys 
850 855 860 

Gin Leu Phe Phe Asn Ser Ser Asp Phe Leu Tyr lie Arg lie Pro Thr 
865 870 875 880 



Ala Tyr Met Asp He Pro Glu Thr Glu Phe Glu He Asp Ser Phe Val 
50 885 890 895 



55 



Asp Ala Cys Glu Trp Lys Val Ser Asp He Arg Ser Gly His Phe His 
900 905 910 

Leu Leu Arg Gly Trp Leu Gin Ser Leu Val Gin Asp Thr Lys Leu His 
915 920 925 



60 



Leu Gin Asn He His Leu His Glu Thr Ser Arg Ser Lys Leu Ala Gin 
930 935 940 

Tyr Phe Thr Thr Asn Lys Asp Lys Lys Arg Lys Leu Lys Arg Arg Glu 
945 950 955 960 
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Ser Leu Gin Asp Gin Arg Ser Arg lie Lys Gly Pro Phe Asp Arg Asp 
965 970 975 

Ala Glu Tyr lie Arg Ala Leu Arg Arg His Leu Val Tyr Tyr Pro Ser 
980 985 990 

Ala Arg Pro Val Leu Ser Leu Ser Ser Gly Ser Trp Thr Leu Lys Leu 
995 1000 1005 

His Phe Phe Gin Glu Val Leu Gly Thr Ser Met Arg Ala Leu Tyr lie 
1010 1015 1020 

Val Arg Asp Pro Arg Ala Trp lie Tyr Ser Val Leu Tyr Gly Ser Lys 
025 1030 1035 1040 

Pro Ser Leu Tyr Ser Leu Lys Asn Val Pro Glu His Leu Ala Lys Leu 
1045 1050 1055 

Phe Lys He Glu Glu Gly Lys Ser Lys Cys Asn Ser Asn Ser Gly Tyr 
1060 1065 1070 

Ala Phe Glu Tyr Glu Ser Leu Lys Lys Glu Leu Glu He Ser Gin Ser 
1075 1080 1085 

Asn Ala He Ser Leu Leu Ser His Leu Trp Val Ala Asn Thr Ala Ala 
1090 1095 1100 

Ala Leu Arg lie Asn Thr Asp Leu Leu Pro Thr Asn Tyr His Leu Val 
105 1110 1115 1120 

Lys Phe Glu Asp He Val His Phe Pro Gin Lys Thr Thr Glu Arg He 
1125 1130 1135 

Phe Ala Phe Leu Gly He Pro Leu Ser Pro Ala Ser Leu Asn Gin Met 
1140 1145 1150 

Leu Phe Ala Thr Ser Thr Asn Leu Phe Tyr Leu Pro Tyr Glu Gly Glu 
1155 1160 1165 

He Ser Pro Ser Asn Thr Asn He Trp Lys Thr Asn Leu Pro Arg Asp 
1170 1175 1180 

Glu He Lys Leu He Glu Asn lie Cys Trp Thr Leu Met Asp His Leu 
185 1190 1195 1200 

Gly Tyr Pro Lys Phe Met Asp 
1205 



